ABSTRACT ALGEBRA WITH 
APPLICATIONS 


Irwin Kra, State University of New York at Stony Brook 


and University of California at Berkeley 


Contents 


Introduction if 
Standard Notation and Commonly Used Symbols 9 
Chapter 1. The integers 11 
1. Introduction 11 
2. Induction 12 
3. The division algorithm: gcd and lcm 19 
4. Primes 29 
5. The rationals, algebraic numbers and other beasts 34 
5.1. The rationals, Q 34 
5.2. The reals, R 30 
5.3. The complex numbers, C 36 
5.4. The algebraic numbers 36 
5.5. The quaternions, H 36 
6. Modular arithmetic Ey 
7. Solutions of linear congruences 44 
8. Euler 50 
9. Public key cryptography 55 
10. A collection of beautiful results oT 
Chapter 2. Foundations 59 
1. Naive set theory 59 
2. Functions 60 
3. Relations 64 
4. Order relations on Z and Q 67 
4.1. Orders on Z 67 
4.2. Orders on Q 68 
5. The complex numbers 68 
Chapter 3. Groups 71 
1. Permutation groups 71 
2. The order and sign of a permutation rig 
3. Definitions and more examples of groups 83 
Chapter 4. Group homomorphisms and isomorphisms. 95 
1. Elementary group theory 95 
2. Lagrange’s theorem 98 
3. Homomorphisms 100 
4. Groups of small order 101 


4 CONTENTS 


ALL e. ‘G1 103 
4.2. |G| = 2, 3, 5, 7 and, in fact, all primes 103 
43. |Gl=4 103 
44. |Gl=6 103 
4.5. |G|=8 104 
5. Homomorphisms and quotients 106 
6. Isomorphisms 110 
6.1. Every group is a subgroup of a permutation group 110 
6.2. Solvable groups ns 
6.3. MORE sections to be included LAs 
Chapter 5. Algebraic structures 113 
1. A collection of algebraic structures 113 
2. The algebra of polynomials 118 
2.1. The vector space of polynomials of degree n 120 
2.2. The Euclidean algorithm (for polynomials) 120 
2.3. Differentiation 124 
3. Ideals 125 
3.1. Ideals in commutative rings 125 
3.2. Ideals in Z and C[z] 127 
4. CRT revisited 128 
5. Polynomials over more general fields 129 
6. Fields of quotients and rings of rational functions 130 
Chapter 6. Error correcting codes 131 
1. ISBN 131 
2. Groups and codes LBL 
Chapter 7. Roots of polynomials 143 
1. Roots of polynomials 143 
1.1. Derivatives and multiple roots 147 
2. Circulant matrices 147 
3. Roots of polynomials of small degree 152 
3.1. Roots of linear and quadratic polynomials 153 
3.2. The general case 154 
3.3. Roots of cubics 155 
3.4. Roots of quartics 156 
3.5. Real roots and roots of absolute value 1 158 
3.6. What goes wrong for polynomials of higher degree? 159 
Chapter 8. Moduli for polynomials 161 
1. Polynomials in three guises 161 
2. An example from high school math: the quadratic polynomial 162 
3. An equivalence relation 162 
4. An example all high school math teachers should know: the cubic polynomial 164 
5. Arbitrary real or complex polynomials 164 
6. Back to the cubic polynomial 165 
7. Standard forms for cubics 168 


8. 
9. 
10. 
jae 


CONTENTS 


Solving the cubic 

Solving the quartic 
Concluding remarks 
A moduli (parameter) count 


Chapter 9. Nonsolvability by radicals 


Le 
2 
3. 
4. 
5. 


5.1. 
oes 
5.3. 
5.4. 


6. 


Algebraic extensions of fields 
Field embeddings 
Splitting fields 
Galois extensions 
Quadratic, cubic and quartic extensions 
Linear extensions 
Quadratic extensions 
Cubic extensions 
Quartic extensions 
Nonsolvability 


Bibliography 


Index 


Introduction 


This book is closest in spirit to [7]. Except for Chapters 7 and 9', where the reader will 
need some results from linear algebra (which are reviewed), this book requires no formal 
mathematics prerequisites. Readers should, however, posses sufficient mathematical sophis- 
tication to appreciate a logical argument and what constitutes a proof. More than enough 
information on these topics can be found in [10]. 

The reader should be aware of the following features of the book that may not be stan- 
dard. 


e | have cut the book down to a bare minimum. If a reader is interested in a given 
chapter or it is part of a mathematics course, then every word in it should be read 
and understood. When requested all the details should be filled in and all exercises 
and problems done (their content may be needed in subsequent parts of the ” main” 
text). 

e At times I use a ” familiar” concept before if is formally defined as in Example 1.3. 

e I use italics for terms defined, either formally in definitions or informally during a 
proof or discussion. 

e Most nontrivial calculations and nontrivial management of sets as well as certain 
algebraic manipulations are performed using the symbolic manipulation programs 
MAPLE or MATHEMATICA. 

e MAPLE and MATHEMATICA worksheets are included both in the text and on an 
accompanying disc — this latter format will permit easy program modifications by 
the reader for further exploration and experimentation. This is not a text book on 
MAPLE nor on MATHEMATICA. See [3] for such a treatise. Rather, these pro- 
grams are used as tools to learn and do mathematics. I have tried to use only very 
simple MAPLE and MATHEMATICA programs and routines and to use, when- 
ever possible, commands that are similar to ordinary mathematical expressions and 
formulae. 

e I have tried to keep a reasonable mixture between formal proofs and informality 
(claims that certain statements are ” obvious’ ). 


This book is an introduction to abstract algebra. I have particularly tried to pay attention 
to the needs of future high school mathematics teachers. With this in mind I have chosen 
applications such as public key cryptography and error correcting codes which use basic 
algebra as well as a study of polynomials and their roots which is such a big part of pre- 
college mathematics. 

Portions of the the material in this book were used as a basis for courses tought at Stony 
Brook and at Berkeley. The students challenged me with good questions and suggestions. I 


'The tone and level of mathematical sophistication of these two chapters is considerably different in 
these two chapters from those in the others. Much more background is expected from the reader interested 
in these sections. 


8 INTRODUCTION 


am very grateful to the students who read the material, corrected errors, and pointed out 
ways for improving the exposition. Errors, of course, remain and are the responsibility of 
the author. 


Standard Notation and Commonly Used Symbols 


A LIST OF SYMBOLS 


TERM MEANING 
Z, integers 
Zin congruence classes of integers modulo n 
zx the units (invertible elements) in Z,, 
Q rationals 
R reals 
C complex numbers 
\a| the absolute value of the number a 
gcd(a1, G2, ..., Gn) = (a1, G2, ..., Gn) | the greatest common divisor of the integers a1, do, ..., Gn 
lem(a@1, a2, ..., Qn) the least common multiple of the integers a1, da, ..., Gn 
[an the congruence class modulo n containing the integer a 
a a square root of —1 
Rez real part of the complex number z 
Sz imaginary part of the complex number z 
z=axu+y eS he and y= 2 
z conjugate of the complex number z 
reli absolute value of the complex number z 
?= arg z an argument of the complex number z 
2=7e" r= |z\and } = are 2 
|R| cardinality of set R 
X condition the set of x € X that satisfy condition 
p(n) the Euler y-function evaluated at the positive integer n 
ord|a}n the order of the congruence class [a], 
a|b the integer a divides the integer b 
red, reduction of integers modulo n 
ker(@) kernel of homomorphism @ 
Im(6) image of homomorphism 6 
ae the units (invertible elements) in the ring F’ 
Ria] polynomial ring over the commutative ring R 
F(a) smallest subfield of C containing F' and a 
F(x) the field of rational functions for the field F’ 


10 


STANDARD NOTATION AND COMMONLY USED SYMBOLS 


STANDARD TERMINOLOGY 


TERM MEANING 
LHS left hand side 
elements of sets usually denoted by lower case letters 
sets usually denoted by upper case letters 
RHS right hand side 
iff if and only if 
Cc proper subset 
Cc subset, may not be proper 
acA the element a is a member of the set A 
agA the element a is not a member of the set A 
) the empty set 
| A| the cardinality of the set A 
AUB the union of the sets A and B 
ANB the intersection of the sets A and B 
AS the complement of the set A 
A-B An Be 
Dee ren the elements of X that satisfy condition 


CHAPTER 1 


The integers 


All of us have been dealing with integers from a very young age. They have been studied 
by mathematicians for thousands of years. Yet much about them is unknown and, in their 
education, most people though they have consistently used integers have not paid much 
attention to their basic properties. Only in 2003 was it proven that it does not take too long 
to decide whether an integer is a prime or not. It is still unknown whether one can factor 
an integer (into its prime factors) in a reasonably short time; although the belief is that it 
cannot be done in what is called “polynomial time.” It is also surprising, perhaps, that in 
addition to their obvious role in counting and recording of data, they have deep applications 
to everyday life. The next to the last section of the chapter descibes a public key encryption 
system that allows secure communication, (on the INTERNET, for example) that is based 
on a beautiful theorem of Euler and the fact that it is very hard to factor large integers; the 
last section contains a small collection of results that I found fascinating — some of them will 
be needed in subsequent chapters of the book. 


1. Introduction 
In this chapter, we study properties of the set of integers 
Y Rene Pa a VS Oe 
and the subset N C Z of natural numbers or non-negative integers 
INS Oy 2a fe 


We will assume that the reader is familiar with elementary logic, set theoretic notation 
(reviewed in $1 of Chapter 2), and the basic properties of the binary relations of addition 
(+) and multiplication (-) and the order relation! less than or equal (<) on the integers. 
Thus our basic object of study is the quadruple 


(Z; aa =) 


Three other (but related) order relations are associated to <: less than < (meaning < but 
#), greater than or equal > (meaning ¢) and greater than > (meaning > but 4). It is 
convenient to introduce some more notation. For all a € Z, we let 


Lea = {bE Z;b < a}. 


The sets Zeq, Z>, and Zs, are defined in asimilar manner. In this notation N = Zs = Zs_1. 
Although we do not discuss the basic properties of this system, we emphasize one; the next 
principle. It will be converted in the next section into a property that we will use throughout 
this book. 


'Relations are discussed in Chapter 2. As seen in that chapter, the four order relations on the integers 
are defined in terms of the additive group (Z,+) and the subset N C Z. 


11 


12 1. THE INTEGERS 


THE WELL ORDERING PRINCIPLE: If S C Z is bounded from below (that is, there 
exists a b € Z such that b < s for all s € S) and S # @ (that is, it contains some elements), 
then there exists a least or smallest element” in S (that is there exists an a € S such that 
a <-s for all s € S and if also b < s for all s € S, then a > b); in particular, every nonempty 
set of nonnegative integers contains a smallest element. 


EXERCISES 


(1) Show that the least element of a non empty set of integers that is bounded from 
below is unique. 

(2) Formulate the concept of sets of integers being bounded from above and translate 
the WELL ORDERING PRINCIPLE to such sets. Prove the translation. 


2. Induction 


One of the most powerful tools at our disposal will turn out to be a reformulation of the 
last principle into one that will be illustrated with simple examples in this section and will 
be used extensively throughout the book. The well ordering principle is equivalent to 


THE INDUCTION PRINCIPLE: Let a € Z and assume that for each n € Zs,, we have 
a statement P(n). If P(a) is true, and if for all k > a, P(k) is true whenever P(k — 1) is 
true, then P(n) is true for all n € Zsq. 


We begin with an informal example to illustrate the above principle. 


EXAMPLE 1.1. Let us assume that we have infinitely many dominos lined up in a straight 
line. We are ignoring all kinds of technicalities. For example, exactly what it means to be 
lined up in a straight line, how we order or number the dominos (say they are numbered 1, 
2, 3, ....), the sizes of the dominos (they are all the same), the distances between dominos 
(they should be small in relation to the sizes of the dominos), etc... . We claim that if we 
push the first domino so that in falling it hits the second one, then all the dominoes will fall 
down. The first domino certainly falls down. For induction we assume that the n“” domino 
has fallen down. In doing so, it pushed (hits) the (n+ 1)* domino causing it also to fall. We 
conclude that all of the dominos fall down. 

In working with the principle of mathematical induction, there is always a collection of 
statements, usually an infinite number, and we are trying to prove that each statement is 
true. In the above example the statements are “For each positive integer k, the k“” domino 
falls”. Thus we are trying to establish the validity of an infinite collection of statements. 
The first statement is true, since we push the first domino to fall (and in faling it pushes 
the second). The induction principle allows us to assume the truth of the n™ statement (n 
is an ARBITRARY positive integer) and requires us to establish the (n + 1)* statement. If 
we do so, we conclude that each statement is true. 


WELL ORDERING and INDUCTION are equivalent PRINCIPLES. We show first that 
WELL ORDERING implies INDUCTION. Let 


S = {n € Zs,; P(n) is not true}. 


In the language of analysis (calculus) courses and books, the least element of S is its minimum, infimum, 
or greatest lower bound. 


2. INDUCTION 13 


Then obviously S C Zs,. If S # 0, then by the well ordering principle it would contain a 
smallest element b. But b # asince a ¢ S. Thus b > a and b—1 € Zs, but b—1¢ S. Hence 
P(b—1) is true. The induction hypothesis guarantees that under these circumstances P(b) 
is also true. Thus b could not belong to S; we have arrived at a contradiction, and the set 
S must be empty. 

To establish the converse that INDUCTION implies WELL ORDERING, assume that 
S CZ, that S 4 @ (let a € S) and that for some b € Z, b < s for all s € S. Assume that S 
does not contain a least element. Let P(n),n € Zs(y_1), be the statement that Z<,NS = 0. 
Then P(b—1) is true because S C Zs. Let k > (b—1). If P(k—1) were true, then so would 
be P(k) because otherwise k would be a least element of S. So by induction, Z<, NS = 0 
for alln € Z, n > (b—1). But this contradicts that a >bandae S. 


The well ordering principle (and hence also the induction principle) is equivalent to 


THE STRONG INDUCTION PRINCIPLE: Let a € Z and assume that for each n € Zsa, 
we have a statement P(n). If P(a) is true, and if for all k > a, P(k) is true whenever P(j) 
is true for integers 7 with a < j < (k— 1), then P(n) is true for all n € Zsq. 


We leave it to the reader to verify the equivalence of the two forms of induction. 


We proceed to two examples of the use of induction to prove elementary results. 
EXAMPLE 1.2. For n € Zyo, evaluate the sum of the first n positive integers. 


PROOF. We are required to evaluate 


SiH 1424+..40. 

i=1 
We first derive a formula for the sum. Notice that the first and last terms add up to n+1. So 
do the second and second from the end, the third and third from the end, etc... . By grouping 
appropriate terms we have produced 5 groups each adding up to n + 1 (this statement is 
correct even for odd n when appropriately interpreted). Thus 


(1) tS ane 

i=1 
For the the second proof of the last formula, let us assume that through some process we have 
reached the conjecture that (1) is true for each positive integer n. An induction argument 
can turn the conjecture into a theorem. In this case P(n), for n = 1, 2, 3, ... is the identity 
or equation (1). The base case n = 1 is certainly correct. Assume now that k > 1 and that 
the formula holds for k — 1 (that P(k — 1) is true), then 


k k-1 
:. \ (k-1k, RR R+2 +R (R41). 
dix (Soe) += nan D a en” a 


i=1 
that is, the formula for the sum also holds for k (P(k) is true). The induction principle 
allows us to conclude that (1) holds for all n € Zso. 


EXAMPLE 1.3. The product of any three consecutive integers is divisible by 3. 


14 1. THE INTEGERS 


REMARK 1.4. Formally, this problem should appear only after we have discussed divisi- 
bility in the next section. We assume the reader remembers from high school mathematics 
elementary properties of division of integers. 


PROOF. We are asked to show that for all n € Z, 3|n(n+1)(n+2). Let us use induction 
to establish the last assertion for all integers n > —2. The base case n = —2 certainly is 
true. So let us take k > —2, and assume that 3](k — 1)k(k+ 1). We need to show from this 
assumption that 3|k(k + 1)(k +2). We compute 


k(k + 1)(k +2) —(k—Uk(k 41) =k(k4+1)(kK+2—k41) =3k(k +1). 
Certainly 3|3k(k + 1)and hence the induction assumption that 3|(k — 1)k(k +1) guarantees 


that 3|k(k + 1)(k + 2) as required since the sum of two integers divisible by 3 is certainly 
also divisible by 3. We are left to consider the case n < —2. Notice that 


n(n + 1)(n + 2) = —(—n(—n — 1)(—n — 2), 


and that for any integer a, 3\a if and only if* 3|(—a). Finally observe that n < 2 if and only 
if —n —2>0> —-2. L 


EXERCISES 
(1) (a) Show that the product of any three consecutive integers is divisible by 6. 
(b) Show that for every positive integer n, n° — n is divisible by 5. 
(c) Show that for every positive integer n, 32” — 1 is divisible by 8. 
(2) Prove that for all positive integers n, 
1427437 4+..4n7= Gea Ce 
6 


(3) Do the next worksheet. 

(4) This problem gives a different way to determine the function p(n) of the worksheet 
below and hence a way to establish the formulae for the sum of cubes. As a conse- 
quence of the first two items of that worksheet, it is reasonable to conjecture that 
we have the following identity valid for all n € Zso 


i=1 
for some constants a, b, c, d and e. Evaluate these constants by expressing the sum 
of the first n + 1 cubes in two different ways; that is, start with 


Justify this last formula and then use it to evaluate the five constants. Use the 
last calculation as a basis for an induction argument to prove the conjecture (with 
appropriate values for the 5 constants). 


WORKSHEET # 1. 
This worksheet provides a leisurely way to arrive at a formula for the sum of cubes of 
integers. It is also an introduction to the use of MAPLE. 


3 Abbreviated in many displayed equations as iff. 


(1) 


(2) 


(3) 
(4) 


(7) 


2. INDUCTION 15 


(Sums of integers.) Recall that we proved (in the text) by induction that for all 
positive integers n, 

n(n + 1) 
= a 
(Sums of squares of integers.) Similarly we proved (in the exercises) by induction 
that for all positive integers n, 


ea ace Ie 


+2? +..4n? = a aay 
6 

(Sums of cubes of integers.) The aim of this worksheet is to formulate and then 
prove a similar result for sums of cubes. We follow a leisurely path. 

Notice that the sum of the first n positive integers is a quadratic polynomial in n. 
The sum of the squares of the first n positive integers is a cubic polynomial in n. 
It is hence reasonable to expect that the sum of the cubes of the first n positive 
integers is a fourth degree polynomial in n; that is, 


1°42? 4+ ...4+n3 = an'* + bn? + cn? + dn+e, 


for some constants a, b, c, d and e that do not depend on the variable n. What are 
the corresponding constants for sums of integers and sums of squares of integers? 
Can you make some “educated guesses” about what the 5 constants should be? 

If we are not to rely on guesswork nor on inspiration, then one of our tasks is to 
determine the 5 constants. If equation (2) is to hold for all integers n, it certainly 
should hold for for n = 1, 2, 3, 4 and 5, leading us to five equations 


L=a--b- ed +6, 
9 = l6a+ 8b4+ 4c+ 2d+e, 
36 = 8la+ 27b4+ 9c + 3d+e, 
100 = 256a + 64b + 16c + 4d + e 


and 
225 = 625a + 1256 + 25c4+ 5d+e. 


If our intuition is right, the above system of linear equations should have a unique 
solution. Recall from your linear algebra course that a necessary and sufficient 
condition for the above system of equations to have a unique solution is that the 
matrix 


1 i abe 
16 8 42 1 
81 27 93 1 

256 64 16 4 1 
625 125 25 5 1 


be nonsingular. One could certainly compute its determinant by hand and show 
that it is non-zero. Do it using MAPLE or MATHEMATICA. You should get that 
the determinant equals 288. 

Now use MAPLE or MATHEMATICA to solve the system of equations. You should 
have obtained a polynomial p(n) with rational coefficients. You are trying to prove 
by induction, because so far we have no guarantee that the equation is correct, that 


P42 +t nt Spm) 


16 


SS 


1. THE INTEGERS 


for all positive integers n. 

Let’s make the polynomial look prettier. First write p(n) as a) where P(n) is 
a polynomial with integer coefficients and N is a positive integer, chosen as small as 
possible. Now factor the polynomial P(n). The formula you now need to establish 
for sums of cubes should appear similar to the ones for sums of integers and sums of 
squares. Prove by induction that the formula you obtained is true. Thus finishing 
this exercise. 
To get used to work with symbolic manipulation programs you may want, after 
attempting by yourself the steps outlined above, to consult the MAPLE program 
following this workshheet that outlines the commands needed to perform the cal- 
culations. There is a very nontrivial initial investment of time in learning to use a 
program of this kind. But, if one needs to do many symbolic calculations, it pays 
off in the long run. 


(9) Were your “educated guesses” about what the values of the 5 constants close to the 


mark? 


(10) Note that MAPLE has a command that evaluates p(n) directly. 
(11) Can you formulate and prove a similar result for sums of fourth powers of integers? 


MAPLE SESSION #1. 


(Most MAPLE warnings were suppressed in this and other printouts. ) 


ais 
Matrix(([1,1,151,1)5[16,8,4,251) 5 [81527,9,331] (256,64, 16,4, 7].; [625,12 


5,25,5,1]]); 
1 He. > lt A 
16 8 42 1 
a:= 81 27 9 3 1 
256 64 16 4 1 
625 125 25 5 1 
with(linalg) ; 
det (a); 
288 


b := Vector[column] ([1,9,36,100,225]); 


1 

9 

os 36 
100 

225 


fe Dy 
4’ 2’ 4?” 


poly := (y74 +2*y73 +y72)/4; 


linsolve(a,b); 


2. INDUCTION 17 


> p := 4 *poly; 


pi=yit2yty 
> factor(p); 


y? (y+ 1)? 
> sum(k73,k=1..n); 
1 1 i 
- 1) a + 1)° 4 Pe 
q (nt ) 5 (n ) q” ) 
> simplify(%); 
UY ene ee ee 
Qe Rg ag 


> factor(%); 
1 
7A n* (n+1)? 
**KKEND OF PROGRAM*** 


We follow this and, as appropriate, most other MAPLE and MATHEMATICA sessions 
with some explanatory remarks. 


(1) The first and third commands of the program enter the 4 x 4 matrix a and the 
column vector b € R*, respectively. 

(2) The second command, introduces the linear algebra package (a technical MAPLE 
requirement) and computes the determinant of the matrix a. 

(3) Since det a 4 0, the equation ax = b is solvable. The solution is obtained by the 
fourth command. 

(4) The next three commands obtain the polynomial p. 

(5) The last three commands use MAPLE commands to directly evaluate the sum of 
cubes. 

(6) Note that MAPLE (the version used here) employs the symbol % to denote the 
result of its last calculation. 


MATHEMATICA SESSION #1 


In the interactive MATHEMATICA session (notebook) reproduced below we study sums of 
4” powers of integers. Two avenues are explored. 


Sum|[k’4, {k, 2}] 
17 


Sum|[k’4, {k, n}] 

agn(1 + n)(1 + 2n) (—1+ 3n + 3n?) 
% + (n+1)%4 
(1 +n)* + gon(1 + n)(1 + 2n) (—1 + 3n + 3n?) 
Simplify[%] 
(l+n)*+ tn(1+n)(1 4+ 2n) (—14 3n 4 3n?) 


18 1. THE INTEGERS 


Expand|%]| 

my \ n3 | 3n4 n° 
Lag OW Pogo hae 
Factor[%] 
ag(1 + n)(2 + n)(3 + 2n) (5 + 9n + 3n?) 


f{nj:=an"5 + b n*4 + cn*3 + dn’2+en+h 
Solve[Coefficient[ f[n] + (n + 1)4,n, 4] == 
Coefficient[f[n + 1], n, 4], a] 


ite si 


5 
Solve[Coefficient[ f[n] + (n + 1)44,n, 3] == 
Coefficient|f[n + 1], n, 3], 5 


115 ahh 


b=1/2 
1 


2 
Solve[Coefficient[ f[n] + (n + 1)%4,n, 2] == 
Coefficient|f[n + 1], n, 2], c] 


te 35 


c=1/3 
1 


Solve|Coefficient| f[n] + (n+ 1)44,n] == 
Coefficient[f[n + 1], n], d] 
id - OF 


0 
Solve[Coefficient[f[n] + (n + 1)*4,n, 0] == 
Coefficient|f[n + 1], n, 0], e] 


fa m4 want a me 
_ ro 


Solvel f[1] ==1,h] 
{{h — Of} 
h=0 


0 
Factor[f|[n]] 
agn(1 + n)(1 + 2n) (—1+ 3n + 3n?) 
***END OF PROGRAM*** 


e The reader should note the diiference in appearance of a MATHEMATICA session 
from a MAPLE session. As with MAPLE, a command line (which may appear on 
more than one printed line) is followed usually by the program’s response. 

e The first program command is practice to familiarize us with the language. The 
computer’s response gives us confidence that we used appropriately the command. 


3. THE DIVISION ALGORITHM: GCD AND LCM 19 


e The second command evaluates symbolically )77_, k* = 4n(1+n)(1 + 2n)(—-1+ 
3n + 3n?). 

e Steps 3 through 6 give the induction aqrgument to establish the above formula. 

e We begin an exploration of how to arrive at the above formula. From our work on 
sums of first, second and third powers of integers, it is reasonable to expect that 
Sp, k* is a fifth degree polynomial in n. 

e Steps 7 through 21 of the program determine this polynomial. The commands use 
language that is very close to mathematical expressions and the reader should be 
able to follow it. 

e In the above program we equated the coefficients of the zeroth, first, second, third 
and fourth powers of n in two polynomials of degree 5 to evaluate some undetermined 
coedfficients. We did not use an equation for fifth powers. Why not? 


3. The division algorithm: gcd and lcm 


The fact that the non-zero integers are not closed under the binary operation of division, 
rather than being a problem, presents an opening for all kind of investigations into the deeper 
properties of integers; some of these have practical implications as we will see later. 


DEFINITION 1.5. Let a and 6 € Z. We say that a divides b or a is a factor of b or b is a 
multiple of a (and write alb) if there exists a q € Z such that b = qa. 


REMARK 1.6. Note that for all a € Z, a|0. Thus every integer (including 0) divides 0. 
But only 0 is a multiple of 0, as expected. 


CAUTION 1.7. Do not confuse the symbols a|b and ¢. The first states, more or less, 
that b (which may be 0) can be divided by a to obtain an integer; the second represents the 
number obtained by dividing a by 6 (which must be assumed 4 0) which need not be an 
integer. 


PROPOSITION 1.8. Leta, b, c, G andy € Z. If al|b and alc, then a|(Gb+ yc). 


PROooF. That a|b and alc means the existence of integers gq, and q2 such that b = qa 
and c = qa. Thus 


Bb + ye= Bqia + yqua = (Bq + YQ2)a. 


EXAMPLE 1.9. For all n € Zyo, 13| (42"7! + 3+). 


PROOF. The proof is by induction on n. The starting point, n = 1, is of course trivial. 
We assume that we have the divisibility condition for k > 1 and establish it for the successor 
integer k + 1: 


q2k+l ae gkt2 = 4242k-1 as 42gkt1 _ 42gkt1 ty 3 : gktl a 16 aes a cana _ (16 a oy aa 


the induction hypothesis tell us that 13] (4°41 + 3**1) and since 13|(3 — 16), the last propo- 
sition tell us that 13] (42+? + 3*+?), 


DEFINITION 1.10. Let n € Zso, we define n! (to be read n-factorial) by induction as 


is L . tor = 0 
“| n(n-1)! forn>0 ’ 


20 1. THE INTEGERS 


and if k € Z with0O <k <n, then we let 


(these are called the binomial coefficients (n choose k)). 


The next result does not depend on divisibility properties and could have been established 
in the previous section. 


THEOREM 1.11 (The binomial theorem). For alln € Zyo and all x andy € Z, 
we, - n n-1, 4 
(x+y) “(7 )= y’. 


PROOF. We fix x and y and use induction on n. The base case, n = 1, is trivial. Assume 
that k > 1 and that we have the result for n = k; that is, 


k 
k wt 
k k-i, i 
(x + y) “(i }e y’. 
For the induction argument, 


k k+1 

k it it 

ety = (ct met w= (+9) (Faby = Saat tys 
i=0 


i=0 
for some integers do, 41, .-., G41 that we need to determine. Obviously 


w=($)=1= (#5) man =(E)=1=(81) 


For (the interesting cases), 1 <i <k, 


_[(k k tg => k! _ (k-it+)ti_ (k+))! 
a= (7 )+( 54, ] Wk—i)! (@-1Wk—-14+1)! Ae + 1-0) W(k+1—i)! 


REMARK 1.12. We have never used that x and y are integers. The theorem is valid for 
general indeterminate x and y. 


THEOREM 1.13 (The division algorithm). For all a € Zso and all b € Zso, there exist 
unique integers q andr such thatb=aq+rand0<r<a. 


PROOF. The proof has two parts. 
Existence: If a > b, then g = 0 and r = b. Now assume that a < b. We let 
D = {b—ak;k € Zyo and b — ak > O}. 
The set of non-negative integers D is not empty since it contains b (we use k = 0). It is 


bounded from below (by 0). Hence it contains a least element r; further, b— aq = r for some 
q € Zso. We need to verify that 0<r <a. Since r € D,r > 0. If r > a, then 


0<r—a=b-a(q+l). 


We conclude that r — a € D contradicting the fact that r was a smallest element of D. 
Uniqueness: Assume that 6 = aq+r as in the statement of the theorem and also that 


3. THE DIVISION ALGORITHM: GCD AND LCM 21 


b= aq, + 1; for some integers gq; and r; with 0 < r; < a. It involves no loss of generality to 
assume that 7; > r. Thus 

alg—a)=(r1-7), 
and we conclude that a|(r; — r). Ifr; Ar, then 0 <r, —r <r, < a and so a cannot divide 
(r; — r). We conclude that r; = r and hence also q; = q. 


EXAMPLE 1.14. For b= 17 and a= 3, q=5 andr =2. 


REMARK 1.15. The last theorem is valid for all b € Z. We establish the existence part 
for b < 0. By the theorem as stated, there exist unique integers g and r such that 


—b=aq+r, 0<r<a. 
Thus 
b = a(—q) + (—r). 
If r = 0, we are done otherwise we continue with 
bao (=) = ae = 1) oes): 
Since 0 < a—r < a, we have concluded the existence argument. Note that the proof of 
uniqueness part of the theorem never assumed that b was non-negative. Why is it unnecessary 
to consider a € Z<o? If we also want to consider such a, it is convenient to introduce the 
absolute value of a € Z defined by 
wie a ifa>0 
aS as TOO 
The division algorithm can now be stated as follows: For all a and b € Z with a ¥ 0, there 
exist unique integers g and r such that 


b=aq+rand0<r< al. 
This is the formulation we will use in the sequel. 


DEFINITION 1.16. It is useful to introduce two definitions with notation motivated by 
computer science. Let a and 6 be integers with a > 0. We define the integral content or 
floor [2 of the rational number b as‘ the largest integer < b and the ceiling [21 of b as the 
smallest integer > B We define r =r (4) by 


a 


b b 
(3) baal] +r (2). 
a a 
REMARK 1.17. Note that 0 < r(*) < a and that (3) is another way of writing the 
division algorithm. The formula is also valid, with proper interpretation, for negative a since 
Sg at leak alt =a a neg) =F a 
THEOREM 1.18. Let a and b € Z, not both 0. There exists a unique d = (a,b) = 
gcd(a,b) € Zxo such that 
(i) dla and d|b and 
(ii) cld whenever c € Z, cla and clb. 


“For this definition we need the concept of order relations on the rationals. See, for example, the next 
chapter for a discussion of this topic. 


22 1. THE INTEGERS 


PROOF. Let 

D = {as + bt;s and t € Z and as + bt > O}. 
The set D is not empty (it contains” either |a| or |b]) and is bounded from below (by 0). It 
hence contains a smallest (positive) element d = as, + bt,, where s, and t, € Z. We have 
produced d. Now we must verify its claimed properties. For the proof of (ii), note that c 4 0 
and we may assume that c € Zy9. Since it divides both a and 6, it obviously divides d. Thus 
establishing (ii). By the division algorithm a = qd+r, where r andq € ZwithO<r<d. 
Thus 

r=a-—qd=a-— q(as,+ bt.) = a(1+ qs.) + b(-¢qto), 

and if r > 0, then it belongs to D and is smaller than d. This contradiction shows that 
r = 0 and hence dja. Similarly d|b. We have established existence. For uniqueness assume 
that d,; € Zo also satisfies conditions (i) and (ii) (with d replaced by d,, of course). We use 
(i) for d and (ii) with c = d, to conclude that d,|d. Similarly d{d,. Since both d and d, are 
positive integers, we conclude that d = dy. 


DEFINITION 1.19. The last theorem defined the two symbols (a,b) and gcd(a, b) that we 
abbreviated by the symbol d. We call d, the greatest common divisor of a and_ b, and we 
say that a and 6 are relatively prime if d = 1. 


COROLLARY 1.20 (of proof). For alla andb € Z, not both 0, (a,b) is the smallest positive 
integral linear combination of a and b. 


REMARK 1.21. Note that (a,0) = |a| for a € Zyo, and that (a,b) = ({a],|b]) for a and 
b € Z, not both 0. It is convenient to extend the definition of the gcd to include (0,0) = 0. 
Note also that (a, 1) = 1 for all integers a; that is, all integers are relatively prime to 1. 


EXAMPLE 1.22. (25,12) = (25, -12) =1, 1 =1-25+ (—2)12 and 1 = 1-25 + 2(-12). 
LEMMA 1.23. Leta and b € Z, and letb=aq+r with ¢q andr € Z. Then (a,b) = (a,r). 


Proor. Let d = (a,b). Then dlr and thus d|(a,r). But also (a,r)|b (and trivially 
(a,r)|a); hence (a,r)|d and we must have that (a,r) = d. 


THEOREM 1.24 (The Euclidean algorithm). Let a andb € Z witha #0. Then 
(a) if a divides b, there exists a unique q, € Z such that 


b= aq, and (a,b) = |al, 


and 
(b) if a does not divide b, there exists a unique n € Zyo, unique 71,72,--,Tn € Zso and 
UNIQUE G1, 25-5 Ino Insi © Z such that 


b=ryi= ant, 0<1 < fal 


QG=To= T142t+7T2, 0<r<1r1 
Tr = ogg +73, O< 73 < Le 
Tn-2 = Tn-19n + Tr; =< Tn <Tn-1 
Tn-1 = Tndnt+i 


"If a £0, then D contains a = 1a if a > 0 and it contains —a = (—1)a ifa <0 


3. THE DIVISION ALGORITHM: GCD AND LCM 23 
and (a,b) =Tp. 


PRooF. Part (a) of the theorem has, of course, already been established. For part (b), 
the existence and uniqueness of n, and the collections of r; and q; follow from the division 
algorithm. The form of the last line in the list of equations follows from the fact that the r; 
are strictly decreasing. The last lemma tells us that 


(b, a) = (a, r1) = (r1, 12) SHS. (Tn=9, Ta-1) = apa Ta) =PTn- 


REMARK 1.25. It is usuful to introduce some convenient notational conventions. 


e To have consistency of natation we labeled b = r_; and a = ro. 
e The last line of the algorithm reads 


Tn—1 =Tndns1 $+ 7ngi With ryi, = 0. 


e Note also that for? =1, 2, .., n+1,q= /a=| 


EXAMPLE 1.26. We apply the Euclidean algorithm to a = 30 and b = 172: 
172= 30-54 22 


30 = 22-148 

22> 82-6 
= 6-142 
6S 23 


Thus (172,30) = 2. We know that there exist integers r and s such that 2 = 172r + 30s. 
We find them by reading the Euclidean algorithm back-wards (starting with the next to last 
line): 
2=8-6 =8- (22-—2.-8) 
=3-8—22 =3(30 — 22) — 22 
= 3-30—4-22 =3-30-—4(172—5-30) ° 
= 23-30—4-172 

Thus r = —4 and s = 23. 
We expect to get the same result for a = 172 and b = 30. The calculations for the Euclidean 
algorithm should also read more or less the same as above. They do, except that the 
calculations have an extra line at the start: 


30 = 172-0+30. 


We systematize the above procedure using ideas suggested by the row reduction method of 
linear algebra. We describe the GCD algorithm. (We use the notation introduced in Theorem 
1.24.) The algorithm consists of calculating n + 2 matrices and producing n + 1 arrows 
(corresponding to row operations on matrices) between them; the computations involve only 
2 x 2 integer matrices and integer vectors written as columns. We fix a and b € Z and 
assume that neither integer divides the other.° The aim is to compute (a,b) and express it 
as an integral linear combination of a and b. It involves no loss of generality to assume that 


The case where either ab or bla is, of course, trivial. 


24 1. THE INTEGERS 


|b] > ja]. For notational and computational convinience we use expanded (2 x 3) matrices 
of the form 

y 

x 


a p 
(4 
with integer entries. This last expanded matrix is understood to stand for the matrix product 


: ele) =[2]: 


The key to the method is the realization that standard row operations preserve this symbol- 
ism. We now describe the first three steps in the algorithm to find (a,b) and express it as 


an integral linear combination of a and b. 
ry se 1 ~h ry 
a=1192 +12 —go Ll+qq2 | ro |- 


cr | E —h 
71 
The first expanded matrix is obvious: the 2 x 2 identity matrix followed after the long 


aaa a “sil 


vertical dash by the column vector : | We have supplied an equality for b using the 


first step of the Euclidean algorithm to justify the method. The substitution b = aq; +r; 
is not needed in practice. Recall that q; = [2]. The q, over the first arrow indicates that 
we should multiply the second row by q, and subtract it from the first row to obtain the 
second expanded matrix; that is, we are subtracting from the first row the largest integral 
multiple of the second row that leaves the rightmost entry of the first row nonnegative. The 
gz over the second arrow indicates that we should multiply the first row (again this is this 
the row whose third entry has smallest absolute in its column) by qz and subtract it from 
the second row to obtain the third expanded matrix; that is, we are subtracting from the 
second row the largest integral multiple of the first row that leaves the leftmost entry of 
the second nonnegative. For convenience we place the arrow on the same line as the row 
whose multiple is being subtracted. We stop this alternating process when we first obtain a 
0 as the rightmost entry. If, at this stage, the row with the ¥ 0 rightmost entry is [r, s, dl, 
then (a,b) = d=ra+ sb. The line with the 0 entry in the last matrix [p, 0,0] tells us that 
0 = pa+ob. 
We illustrate with a = 30 and b = 172: 


1 0 | 172 ie 5 De | oe 1 5.98 2S eee 
Oy cabal 20s ||) Zech ath, Obs a6 ms ee ae (es ee 
ee PCBS eae eG 15 —86 | 0 

SAP O54 DS We | eed: 9998] > 


We conclude (once again) that (172, 30) = 2 = —4-172+23-30. Also that 0 = 15-172—86-30. 
Signs do not alter much. We take up the case a = 30 and b = —172: 


is Ob ee re ib. Ges! | ee Pde. tee. GS A 6 BS | 9 
01 30 ee Wig eAleaG a See al 6 (eee a (os Se aU 


3. THE DIVISION ALGORITHM: GCD AND LCM 25 


We conclude (not surprisingly) that (—172, 30) = 2 = 4(—172)+23-30 and 0 = (—15)(—172)— 
86 - 30. It may be only slightly surprising that the introduction of a minus sign shortened 
the calculation. 
THE GCD ALGORITHM ~— a formal description 
The algorithm can be described as a diagram consisting of n + 2 matrices {A;};i = 
0, 1, ..., n+ 1, of the form (4) (that hence satisfy (5)), and n + 1 maps 


qi: Aj-1 — Aj, 1=1, 2, 223585 n+l. 


The i” such map is represented by an arrow with the number g; above it”: 


|| eye {4 =a jr | S,..] 1 = Ti 
to=| 4 1 TO | a, =| 6 If a] a —q2 1+ q1ge T92 
Ree 1+q3¢2 —q —@3(1+qi¢2) | 73 
= —Q 1+ q142 ro | 
a; Bi | Ti-1 itt itr | Tita 
A; — Vj 6; r; | qdi+1 Ajit ee ne r; | ean 
—_ An Bn Tn—2 ay _ An+1 Ga 1 (a, b) 
An 7 Yn On Tn-1 | Ant a Yn4+1 Onda 0 : 


ila 0 ies 
0 lia 
obtained from the entries in the matrix A;_,, and the matrix A; is obtained by applying the 
operator q; to the the matrix A;_;. This operator depends on the parity of the integer 7. For 
the above diagram, we have assumed that 7 is even and n is odd. The integer q; is computed 
from the last column of the matrix A;_;. For even 7, the operator q; takes the second row of 
the matrix A;_; and turns it into the second row of the matrix A;; and it sets the first row 
of the matrix A; to be the first row of the matrix A;_; minus q; times its second row. For 
odd 7, the roles of the rows are reversed. 


The starting matrix is Ap = 


| For i = 1, 2, ..., n +1, the number q; is 


PROOF. We need to verify that each of the matrices A; satifies (5). We use induction 
on i. The matrix Ao satisfies (5) by construction. So assume that for a given integer s, 
0<s<n+l, the matrix A, satisfies (5). Let us assume that s is even.® Thus 
asb 28 Bsa = Ts-1 
and 
yop + 6,4 = 1s. 


We let qsi1 = Keak Now 
Ost1 = As — Fst1 7s, 
Bs41 = Bs — Fs419s, 
Yst1 = Ys: 
Os+1 = 9s 


7We view q; as an operator (map between matrices) and as a number (an integer); this should not cause 
confusion. 


’The argument for odd s is similar. 


26 


and 


1. THE INTEGERS 


Ps41 = Ts—1 — Ws+1"s- 


Hence 


5410+ Be414 = (as = Os2iVs) b+(Bs — Gs 


and 


1155) @ = Ts-1—Gs41(Ysd+054) = 15-1 Asti" s = Ms41 


Ys+1b + Os41 = sb at sa = 1s; 


finishing the induction argument. 


MATHEMATICA SESSION #2 


We illustrate the use of the GCD algorithm by computing (11235,603). This is a tran- 


script of an interacive session. 

m0 = {{1,0, 11235}, {0, 1, 603}} 
{{1, 0, 11235}, {0, 1, 603}} 

ql = Floor[11235/603] 

18 

m1 = m0 — 18{{0, 1, 603}, {0,0,0}} 
q2 = Floor[603/381] 

1 

m2 = ml — {{0,0, 0}, {1, —18, 381}} 
{{1, —18, 381}, {—1, 19, 222}} 

q3 = Floor[381/222] 

1 

m3 = m2 — {{—1, 19, 222}, {0,0,0}} 
{{2, —37, 159}, {—1, 19, 222}} 

q4 = Floor[222/159] 

1 

m4 = m3 — {{0, 0, 0}, {2, —37, 159}} 
Floor[159/63] 

2 


m5 = m4 — 2{{—3, 56, 63}, {0, 0, 0}} 
g5 = Floor[63/33] 

1 

m6 = m5 — {{0,0, 0}, {8, 149, 33}} 
{{8, -149, 33}, {—11, 205, 30}} 

g6 = Floor[33/30] 

1 

m7 = m6 — {{—11, 205, 30}, {0, 0,0}} 
{{19, —354, 3}, {—11, 205, 30}} 

q7 = Floor[30/3] 

10 


m8 = m7 — 10{{0, 0, 0}, {19, —354, 3}} 


3. THE DIVISION ALGORITHM: GCD AND LCM 27 


{{19, 354, 3}, {-201, 3745, 0}} 
GCD{[112305, 603] 
3 


***END OF PROGRAM*** 


(1) All but the last command of the program implement the GCD algorithm. 
(2) The matrix m7 yields the gcd 


(11235, 603) = 3 = 19- 11235 — 354 - 603 
and the companion identity 
03 = —201 - 11235 + 3745 - 603. 


(3) The last section of the program shows the command that MATHEMATICA uses to 
compute the gcd of two integers. 


DEFINITION 1.27. Let n € Zyo and let aj,...,a, € Z. We define the greatest common 
divisor 
(Gis apy) = ed (ayy Gy) 
of a1, ..., dy, to be 0 if all the a; are 0 and otherwise as the positive integer m with the following 
two properties: 
(1) ona; tor 7 15.200, tan 
(ii) whenever c € Z, c# 0 and cla; for i = 1,2,...,n, then also c|m. 


REMARK 1.28. Some obseravations are required. 
e It should be checked that the concept is well defined (that is, that m exists and is 
unique) as is done in the next theorem and that the definition for n = 2 agrees with 

the previous one that we used as is obvious. 


e For all 0 4a € Z, (a) = |a|. So for n = 1 there are no issues involving existence or 
uniqueness of m. 


THEOREM 1.29. Let n € Zs,. For all ay,...,dn € Z, (@1,...,dn) exists and is unique. 
Further 


(6) (Gis a = (Gi aerial 7 eae 


PROOF. If all the a; = 0, then there is nothing to prove. So assume that they are not 
all zero. We use induction on n > 2. For the base case, n = 2, the existence of the gcd has 
been established and (6) reads 

(a1, 42) = (|ai|, a2); 
a correct formula. So we assume now that k > 2 and that by induction we have the existence 
of (a1, ...,@%—1) and (6) form = k—1. We proceed to establish the existence of (a1,...,a,) as 
well as (6) forn =k. Let m = ((aj,...,@-1), @x). By the induction hypothesis (qj, ...,@,—1) 
exists and is unique. The case n = 2, tells us that m exists and is unique. We have only 
to verify that m has the required properties. So m|(aj,...,@,-1) and mla, from the n = 2 
assumption. But fori = 1, ..., k—1, (a1,...,a%~1)|a;; so also mla;. If c € Z, c £0 and cla; 
for i = 1, ..., k, then also c|(aj,...,@,_1) (the induction k — 1 case) and hence c|m (the n = 2 
case). The proof of the uniqueness of the gcd is left to the reader. 


28 1. THE INTEGERS 


THEOREM 1.30. Let a, b andc € Z, none 0, and (a,b) = 1. 
(i) If albc, then alc. 
(ii) [falc and b\c, then ab|c. 


PROOF. That a and 0 are relatively prime tells that there exist integers r and s such 
that 
1=ar-+bs. 


Thus c = car+cbs. Assume that albc. Since a|car and albsc, we see that then a|c, establishing 
(i). Assume that alc and b|c, then ab|cb and ba|ca. Thus also ba|car and ab|cbs and hence 
ab|(car + cbs) = c. oO 


DEFINITION 1.31. Let a and 6b € Z. We define the least common multiple M of a and 
b, in symbols M = lcm(a,b), to be 0 if a and b = 0. Otherwise, we define the lcm as the 
unique M € Z,» that satisfies 
(i) ifa £0, then a|M and if b 4 0, then b|M and 
(ii) if a 40 (6 40) and cis a multiple of a (b), then M|c. 


We leave it to the reader to define Icem(a1,...,a,) and to prove the analogue of Theorem 
1.29 for the lem of n integers. 


EXERCISES 


(1) For each of the following pairs of integers a and b, find (a, b) and express it as ar +bs 
with r and s integers: 
(A) a= and b= 11, 
(b) a = —55 and b = 25. 
(c) a= —75 and b= 21. 
(d) a= —45 and b= —81. 
(e) a = 5245 and b = 1345. 
(f) a = 6321 and b = —291. 

(2) The Fibonnacci sequence {F;,} is defined inductively by the condition that the first 
two terms of the sequence are 1 and each subsequent term is the sum of the two 
preceding terms. Write down the formulae that define the terms of this sequence 
and prove that for all n € Zso, (Fn, Fa4i) = 1. 

(3) Let a, b and c € Z, with at most one of these equal to zero. Assume that (a,c) = 
1 = (b,c). Show that (ab,c) = 1. 


(4) Show that the binomial coefficients ( ; ) € Zo. 
(5) (a) Let m,n € Zso. Prove the identity: 


(0) Ce") 


Hint: Consider the polynomial equation 


s (" : ‘) z= (142)™™" = (1+z)™(1+2)". 


k 
k=0 


4. PRIMES 29 


(b) Show that if n > 1, then 


(6) Show that if n € Zso, then 


ee 
ort, 

(7) Show that for all a and b € Z with a £0, [2] = — [=]. 

(8) Augment the argument of Remark 1.15 to complete the proof of the division algo- 
rithm (both the existence and uniqueness claims) as given by (3) (consider the four 
cases of possible signs of a and b). Base the proof of existence on Theorem 1.13 and 
then supply a uniqueness proof. Give an alternate proof of existence that is valid 
in all cases (thus not relying on Theorem 1.13) by considering as before the set of 
integers 

D = {b-ak;k € Z and b — ak > 0}, 
and establishing that this set is nonempty. 

(9) Let a,b r and s € Z, not all zero. Assume that d = ar + bs. Is |d| = a,b? What 
integers can be written as integral linear combinations of a and b? 

(10) Let n € Zyo and q,...,a, € Z, not all zero. Show that there exist and 11,...,1Tn € Z, 
not all zero, such that 


(a1, ...,@ -» A;zT;- 


(11) Let n be a positive integer and let aj, de,...,d, be n integers, not all zero. Define 
Iem(a,,...,@n) and prove that for n > 2, 


lem(qj, ..., @n) = lem(lem(ay, ..., @n_1), Gn). 
(12) In the gcd algorithm, we started with integers a and b with 0 < |a| < |b]. What 
does the algorithm produce 


e if we strart with a and ka, with a and k non-zero integers? 
e if we started with the integer a £ 0 and 0? 


4. Primes 


The additive structure of the positive integers is rather simple. An arbitrary positive 
integer n is constructed from the integers 1 (n copies of the same integer) by n — 1 additions. 
The multiplicative structure of the positive integers is more complicated. We turn now to 
the multiplicative building blocks of Zyo. 


DEFINITION 1.32. A number p € Zs, is prime provided it has precisely two distinct 
positive divisors, namely 1 and p. 


REMARK 1.33. Note that 1 is not a prime. 


We have a fairly efficient method for producing (relatively short) lists of primes known as 
the sieve of Eratosthenes. It consists of a number of steps. Let us choose a positive integer say 
N and we want to produce a list of the primes less than or equal to NV. We proceed as follows. 


30 


1. THE INTEGERS 


e (First step.) We start with the list integers 2,3,..., N. Notice that the first entry in 
our list is the prime 2. 

e (Second step.) We remove from our list all proper multiples of 2; that is, integers of 
the form {2737 € Zx9,2 <i< x}. Notice that the first two entries in the resulting 
list are the first two primes; namely 2 and 3. 

e (Third step.) We remove from our list all proper multiples of 3; that is, integers of 
the form {37;7 € Zs9,2 <i< x}. Notice that the first 3 entries in the resulting 
list are the first two primes 2, 3 and 5. 

e After r steps, we have produced a list that starts with the first r primes: 2,3, ..., p,. 

e (The r+1* step.) We remove from the list produced after r steps all proper multiples 
of the r” prime p,; that is, integers of the form {p,i;i € Zs09,2 <i < x}: The 
resulting list starts with the first r + 1 primes. 

e (The stopping time.) We are done as soon as p2,, > N. 


We need to prove that the above procedure does what we claim. We will do so after 
proving the next theorem (FTA). Obviously the sieve of Eratosthenes algorithm is best 
performed by a computer. A sample MAPLE program using N = 200 follows. 


> 


> 


MAPLE SESSION #2. 


set1 := {seq(i, i = 2..200)}; 


set! := {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 
48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 
92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 
110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 
126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 
142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 
158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 
174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 
190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200} 


set2 := seti minus {seq( 2*i, i = 2..100)}; 


set? := {2, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 
47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 

91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 

125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 

157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 181, 183, 185, 187, 

189, 191, 193, 195, 197, 199} 


> 


set3 := set2 minus {seq( 3*i, i = 2..67)}; 


4. PRIMES 31 


gels 24128; 04 1, A118, 4 192 23.08. 29, 81535, 3 0,41 43, 27; 49: 58. 50,59; GL. 65; 
67, 71, 73, 77, 79, 83, 85, 89, 91, 95, 97, 101, 103, 107, 109, 113, 115, 119, 121, 

125, 127, 131, 133, 137, 139, 143, 145, 149, 151, 155, 157, 161, 163, 167, 169, 

173, 175, 179, 181, 185, 187, 191, 193, 197, 199} 

> set5 := set3 minus {seq( 5*i, i = 2..40)}; 


set. = 423,005 Ty Ve 18, 17419, 93., 29: 31,37, 41, 49,47, 49. 53, 59) Gl, Of,. 71 fas Cr 
79, 83, 89, 91, 97, 101, 103, 107, 109, 113, 119, 121, 127, 131, 133, 137, 139, 
143, 149, 151, 157, 161, 163, 167, 169, 173, 179, 181, 187, 191, 193, 197, 199} 

> set7 := set5 minus {seq( 7*i, i = 2..29)}; 


S60 P= 428i Vly Eh, 192.238.0029, 30 Al AS, AT, 58, 09, Oly Ol lke toy 095085 
89, 97, 101, 103, 107, 109, 113, 121, 127, 131, 137, 139, 143, 149, 151, 157, 163, 
167, 169, 173, 179, 181, 187, 191, 193, 197, 199} 

> seti1 := set7 minus {seq( 11*i, i = 2..19)}; 


Stl 42 Be G, AN IS. Ai IO, 23 290 SL Bi Al AS, 47 Oo. OOe G1. Oley ta, 198 Ba. 
89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 169, 
173, 179, 181, 191, 193, 197, 199} 

> set13 := seti1 minus {seq( 13*i, i = 2..17)}; 


StI S = {OER Dy 7, dl, 13, 175,10. 23, 29 al 87, Al As. AT, oS. D0; Ol, Ole (ly Toy 10580, 
89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 
179, 181, 191, 193, 197, 199} 

> set17 := set13 minus {seq( 13*i, i = 2..12)}; 
$etl 7 4= 49.3.9, 1, 11518, 17,19 23;,.29) BI 87, AL, AS AT. 53:59 61.60. tly Toy 19.80: 
89, 97, 101, 103, 107, 109, 1138, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 
179, 181, 191, 193, 197, 199} 

“END OF PROGRAM*** 


We will see later that care must be used in employing the MAPLE set theoretic command 
minus. 


THEOREM 1.34. Let a, b € Z and p be a prime. If p\ab, then either pla or p\b. 


Proor. Assume that p does not divide a. Then (p,a) = 1; which implies that p|b by the 
last theorem. 


LEMMA 1.35. Let a; € Z fori =1, 2, ..., r (with r © Zso). If the prime p divides the 
product ay...a,, then pla; for some i. 


PROOF. The proof is by induction on r. The base case, r = 1 is trivial. So assume that 
r > 1 and that p|(a,...a,_1)a,. The previous lemma say that either p|(a,...a,-1) or pla,. In 
the former case, the induction hypothesis guarantees that pla; forsome1<i<r—-1l. OO 


32 1. THE INTEGERS 


THEOREM 1.36 (The fundamental theorem of arithmetic, FTA). Let n € Z,,. Then 
there exists a unique r € Zso and primes pi, po ..., py such that 


N= P1P2---Pr = [[»:. 
i=1 


The decomposition of n into a product of primes is unique except for order; that is, if also 


nr = 192---s 


for some s € Zyo and primes q;, j =1, ..., 8, then s =r and for each j, there exists an i 
such that q; = pj. 


PROOF. We use strong induction on n > 2 to show that factorization is possible. The 
base case is trivial since n = 2 is prime. So assume that n > 2. If n is prime, there is nothing 
to do. Otherwise n = ab with a, b€ Z,1<a<nand1<b<n. By the strong induction 
assumption, both a and 6 can be factored as products of primes. Hence so can their product 
ab. 

We use induction on r > 1 to show that factorization is unique. If r = 1, then n = p, is 
prime. If also n = qiq2...¢, . Then pi|q; for some j and it follows that p,; = q; and s = 1. So 
assume that r > 1 and that 

N= P1P2--Pr = 142---ds- 
Then pi|qig2...ds and it must be the case that pi|q; for some 7. As before we conclude that 
pi = q;. Reordering the q;, we may and do assume that 7 = 1. Thus also po...p, = q2...ds We 
conclude by induction that r — 1 = s — 1 and that each p; (¢ > 1) is aq; (j > 1). 


REMARK 1.37. We shall abbreviate “the fundamental theorem of arithmetic” by “FTA.” 
At times it will be useful to write the factorization of an integer n > 1 in a slightly different 
form 


t 

_— pki ke ke ky 

N= Py Py --Pp = ee 
i=1 


where t € Zso, pi, p2,-.-,p_e are DISTINCT primes and the k; € Zyo. This factorization is 
again unique if we list the primes in ascending order. We can also include (unnecessary) 
primes p; with exponent k; = 0 in the products in equation (1.37). By doing so, we loose 
uniqueness, but (as we shall see shortly) gain some advantages in simplifying formulae. Note 
that n = 1 is represented by using any ¢ and all the k; = 0. 


COROLLARY 1.38. There are infinitely many primes. 


PROOF. Let pi, po, ..., Pn be a collection of n € Zyo distinct primes. Then either 
P1p2---Pn + 1 is prime or some prime p divides it. Since p ¥ p; for i = 1, ..., n. We have in all 
cases produced a prime not in our list of n of them. There hence must be infinitely many of 
them. 


DEFINITION 1.39. We can list (enumerate) the infinitely many primes in increasing order 
as 
Pi, P25 +++) Dns +s 
Note that this means in particular, that the entries in the list continue forever, that py < pr41, 
and that p, >n+1 (px, >n+1 for n > 2). We will from now on keep the above notation 
and call p,, the n“” prime. 


4. PRIMES 33 


COROLLARY 1.40. Let a and b € Zyo and write 


a=] pt and b = TI pr", 
i=l 41=1 


where r € Zyo, the p; are primes, and the n; and m; are non-negative integers. Then 
(i) alb af and only if ny < m; for each i, 

(ii) ged (a,b) =[[j_, pee 

(iii) lem (a,b) = []'_, pee" and 
( 


iv) gcd(a, b) lem(a, b) = ab. 
PROOF. Part (i) is obvious. To prove (ii), let d = eae Then by part (i), 
dja and d|b. If c € Zyo divides both a and b, then c = [[j_, pr with integers 0 < k; < 


min{n;,m;}. Thus cld and it follows that d = (a,b). The proof of (iii) is similar to the last 
argument and (iv) follows from the observation that for all pairs of integers m and n, 


m+n= min{m,n}+ max{m, n}. 


EXAMPLE 1.41. Since 135 = 3°5 and 639 = 3°71, we have gcd (135,639) = 3? = 9 and 
lem (135, 639) = 3°5 - 71 = 9585. 


We can now formulate a proposition yielding the sieve of Eratosthenes algorithm. 


PROPOSITION 1.42. Fix an integer N > 5 and consider the steps and the list in the sieve 
of Eratosthenes algorithm. Let a be the smallest integer such that p2,, < N. 
(a) Forallr EN, 1<r<a+tl, afterr steps, the first, r entries in our list are primes. 
(b) Aftera+1 steps, the list consists only of primes. 


ProoF. Part (a) is proven by induction on r. It is certainly true for r = 1. So assume 
that r > 1 and that after r — 1 steps, the list starts with r — 1 primes. If after the r” 
step, a,, the r™ element in our list were not prime, then it would be divisible by a p; with 
j <(r—1). But this is impossible since proper multiples of p; were eliminated from the list 
at the j‘" step. We prove part (b) by contradiction. We know by the first part that after 
a-+1 steps, the first a+ 1 entries in our list are primes: pj, p2,...,Pa+1- If an entry a, in this 
list with k > a+1 is not prime, then since ay, > pa41, ax = bc with one of b or c < paiy4. Say 
that b < pai1. By FTA, we may assume that b is a prime. But this contradicts that a, was 
eliminated from our list in the b-th step. 


DEFINITION 1.43. Let r € Zs and m1, ms, ..., m, a collection of r integers. We say 
that this set is relatively prime if (m;,m,;) =1 for alll <i<j<r. 


REMARK 1.44. The concept introduced above is stronger than the reqirement that 
(m1, Me, «+, M,) =1 


as shown by the set consisting of the three integers 2, 3 and 4. 


EXERCISES 


(1) Show that n € Zyo is a prime whenever 2” — 1 is. 
(2) Prove that there are infinitely many primes of the form 4n + 3, n € Zso. 


34 1. THE INTEGERS 


WORKSHEET #2 


(1) (Definition) Remember that a prime number is an integer p > 1 whose only positive 
divisors are 1 and p itself. This means that a prime number does not admit a 
representation as product of two integers each strictly smaller than p and strictly 
bigger than 1. 

(2) (Factorization in MAPLE) The computer system MAPLE has a routine that com- 
putes the factorization of integers, provided they are not too long. The appropriate 
command is ifactor. For example, if one wants to know the factorization of the 
number 1743756435671253155121751498513846136, one enters the command 

> ifactor (174375643567 1253155121751498513846136) ; 

after a few seconds, MAPLE replies 

(2)?(41) (9609562296348 1) (666787244268091 ) (8297) 

this is the factorization of the entered integer into a product of primes. 

(3) (Fermat numbers) The French mathematician Piérre de Fermat considered numbers 
of the form 2” + 1 to provide prime numbers. 

Using MAPLE, compute the first twenty numbers 2” + 1, and using ifactor 
determine which ones are prime. 

(4) From the previous computations, we can make an educated guess: only numbers of 
the form 27° +1 (that is, when n = 1,2,4,8,16,...) are prime. Fermat thought that 
all the numbers of the form 2?" + 1 were prime, unfortunately he was wrong. 

Using MAPLE, check that the two numbers 2°? + 1 and 2° + 1 are not prime. 

(5) The above computations lead us to think that if a number of the form 2" + 1 is 
prime, then n should be of the form n = 2" for some k > 0. Prove this statement. 
(Hint: Assume n is an odd integer > 3 , then the expression x” + 1 factors as 
(x9 + 1)(2"-1 — a2 4+ 4773 —...+41).) 

(6) (Optional) It is also possible to get primes from numbers of the form 2” —1. Repeat 
the above steps to guess which numbers of this form are prime. 


5. The rationals, algebraic numbers and other beasts 


The reader is surely familiar with other number systems. We briefly review some of these 
— they will not be used much in this book, except to discuss examples of algebraic structures. 


5.1. The rationals, Q. The rationals can be constructed from the integers by use of 
equivalence relations. Those unfamiliar with this topic should first study §3 of Chapter 2. 
Let S = Z~x Zyo. Thus the elements of S are ordered pairs® (a,b) of integers with b > 0. We 
introduce a relation R on S by saying that (a,b) R(a’,b’) if and only if ab’ = ba’. We note 
that: 

1. Ris reflexive since (a, b)R(a,b), 

2. Ris symmetric since (a,b) R(a’,b’) obviously implies that (a’,b’)R(a, b). 

3. R is transitive. To prove this assume that (a,b)R(a’,b’) and (a’,b') R(a”,b”). These two 
statements are equivalent to ab’ = ba’ and a’b" = b/a". We consider cases: 

3a. a’ = 0. In this case, also a = 0 = a”. Thus certainly ab” = ba” or (a,b) R(a", b”). 

3b. a’ £0. We start with a’b” = b’a” and multiply both sides by a to obtain aa’b” = ab‘a”. 
After substituting ba’ for ab’ in the right hand side of the last equality we obtain aa’b” = ba'a”. 
Since a’ # 0, we can cancel it from both sides to obtain ab” = ba” as required. 


°We are thinking, of course, of the ordered pair of integers (a,b) as the fraction¢. 


5. THE RATIONALS, ALGEBRAIC NUMBERS AND OTHER BEASTS 35 


The set of equivalence classes of R, the set of rational numbers, is denoted by Q and 
the equivalence class of (a,b) € S is customarily written as ¢. We define addition and 
multiplication in Q by 

ac ad+be gece 
b'd bd bd bd’ 


Since b £0 4 d, both ae and #5 € S. We must still verify that these operations are well 


qa! 


defined; that is do not depend on the choice of representatives used. So assume that ¢ = 7 
and § = c. We must verify that ae = ade and that # = oe We leave that as an 
exercise for the reader. We note that we can think of Z C Q if we identify n € Z with | € Q. 

What have we gained? Every non-zero rational number ¢ (thus a 4 0) has a multiplicative 


inverse a Is this enough for most applications? The answer is a resounding no since what 


we think of as simple numbers, for example V2, are not in Q; that is, “the rationals have 
holes.” To be more precise, we prove 


THEOREM 1.45. For allr € Q, r? #2. 


PROOF. Assume that for some § with a and b € Z, b > 0, we have a = 2 ed! = (a0) 


then we write a = da, and b = db; with a, € Z, b; € Zyo, and (a1,b;) = 1. Then t= a 
and we conclude that a? = 2b?. Thus a? is even and so must be a; (as a consequence of the 
fundamental theorem of arithmetic). Thus b? and hence also b; is even. We conclude that 


2|(a1, 1); a contradiction. Oo 


REMARK 1.46. The theorem states that the equation x? — 2 = 0 has no solutions in Q. 


5.2. The reals, R. The study of the reals, R, belongs properly to analysis rather than 
algebra. We confine ourselves to the briefest of discussions. The construction of the re- 
als from the rationals is more sophisticated than the construction of the rationals from the 
integers. One method is to identify the reals as the collection of certain subsets of rational- 
numbers known as Dedekind cuts. These are subsets a C Q with the following properties: 


eaA#andaf-Q 
elfaeaanddbe Qwith b<a,thenbea. 
e For all a € a there exists ab € a with b> a. 


We identify a rational r with the Dedekind cut 


{p €Q; p<r}. 
With this identification, Q C R. One must do some work to properly define addition and 
multiplication of real numbers. What have we gained? We certainly filled in some holes in 
the rationals since 
V2 = Qeo U{r €Q0 <r and r? < 2}. 


But much more has been acomplished: we have filled in all the holes in the sense that any 
set S of reals that is bounded from above, must have a least upper bound’. The proof of this 
completeness property is rather simple if one understands what the various concepts mean. 
A point s € S is a subset of Q. It hence makes sense to define *s = Usegs; this is the least 
upper bound for the set S. 


10For precise definitions consult any book on analysis. 


36 1. THE INTEGERS 


5.3. The complex numbers, C. Even though the real numbers are analytically com- 
plete, they are not algebraically complete in the sense that the equation x? + 1 = 0 has no 
solutions in R. One way to remedy this problem is to artificially introduce a solution to this 
equation by defining the operations of addition and multiplication on ordered pairs of real 
numbers: (a,b) € R?. If both (a,b) and (a’,b’) € R?, we define 


(a, b) + (a’, 0’) = (a+a’,b+ 0’) and (a, b)(a’, b’) = (aa’ — bb’, ab! + ba’). 


With this additive and multiplicative structure, R? is a model for the complex numbers C. 
We will study this system further in 85 of Chapter 2. For the moment, we limit the discussion 
to a few observations. 


e The reals are a subset of C consisting of the ordered pairs (a,0) with a € R. 

e We define 2 = (0,1). We then observe that 2? = —1; that is, +2 solve the equation 
x?+1=0. 

e The complex number (a, b) is usually written as a+ bz. The usual laws of arithmetic 
(addition and multiplication) for R then apply to C with the convention that 2 is a 
new quantity (¢ R) whose square is —1. 

e The complex numbers are algebraicaly complete in the sense that every polynomial 
equation (here z stands for an inderminate, n € Zo, a; € C for all intgers 7 with 
<4 <7) 


Git Pape ae ag = 0 
has a solution z € C. 
5.4. The algebraic numbers. 
DEFINITION 1.47. A number a € C is algebraic if it satisfies an equation of the form 
ia faa? pa. = 0, 


where n € Zyo, a € Zyo, and a; € Z for 1 < i < n. All other numbers are called 
transcendental. 


REMARK 1.48. e A complex number is algebraic if and only if it is a root of a 
monic polynomial of positive degree with rational coefficients. 


e It is rather obvious that each rational number is algebraic. Thus the rationals are 
a subset of the algebraic numbers. 


e It is not easy (it requires some preparation) to prove that the algebraic numbers 
form a field (as defined in Section 1 of Chapter 5). See Chapter 9. 


5.5. The quaternions, H. The number systems discussed so far, Z, Q and R are all 
subsets of C. As a matter of fact we have the tower of proper inclusions 


ZACQCRCC. 


Are there any number systems that are supersets of C? The answer is yes, many. But in 
going to “bigger” systems we now begin to loose rather than gain. One such system, the 
quaternions, is described in discussing examples of groups in Chapter 3. In passing from the 
complex numbers to the quaternions, we loose the commutativity of multiplication. 


EXERCISES 


6. MODULAR ARITHMETIC 37 


(1) In our definition of the rationals we used an intermediate set S = Z x Zs. What 
would happen if we had defined this set as S = Z x Z? 

(2) Prove that the operations of addition and multiplication on Q are well defined. 

(3) Introduce order relations (<, <, >, >) on Q and show that they are compatible 
(agree) with the corresponding order relations on Z. 

(4) Show that the set of algebraic numbers is countable. Before doing this problem, you 
may want to review some of the material of the next chapter on cardinality. 


6. Modular arithmetic 


This section deals with what is commonly called “clock arithmetic.” It involves arithmetic 
on (for applications, large) finite sets. It will be the basis for our study of coding (in §9). 


DEFINITION 1.49. Let n € Zyp and a and b € Z. We say that ais congruent or equivalent 
to b modulo n or (for short) mod n (in symbols a = b mod n)"! provided n|(a — b). 


The division algorithm implies the following 


PROPOSITION 1.50. Let n € Zyp anda € Z. There exists a uniquer € Z,0O<r<n 
such thata=r mod n. 


DEFINITION 1.51. Let n € Zy9 and a € Z. We define the congruence class of a modulo 


[an = {bE Z;b=a mod n}. 
An element of the set 
[aln = {....a — 8n,a — 2n,a—n,a,a+n,a+t 2n,..., } 


is called a representative of the congruence class [a],. The last proposition showed how to 
choose a canonical’ representative for each congruence class; that is, an integer in the set 


SOAl dactien 


to be called the standard representative of the class. We denote by Z,, the set of congruence 
classes of the integers modulo n, and usually represend a congrunence class [a], € Z, by an 
integer a € [a],, 0 <a@<n. When there can be no confusion, we will denote [a], also by [a] 
or just a. 


DEFINITION 1.52. Let n € Zso and a and b € Z. We define the operations of addition 
(+) and multiplication (-)'® on Z,, by 


[aln + [Bln = [a + Bn 


and 


[@]n [bln = [ab}n- 


Throughout this section n is a positive integer fixed once and for all. The theory developed for the 
case n = 1 is completely trivial. 

!2\feaning involving no choices. 

!3As usual the - is omitted in most cases. 


38 1. THE INTEGERS 


We must show the last definitions are well defined (make sense). First let us interpret 
what the definitions say. To add (multiply) two congruence classes, say |a],, and [b],, choose 
representatives a and b of these classes. Add (multiply) these representatives to get a+ b (ab 
in case of multiplication) and then take their respective congruence classes [a+ 6], ({ab],, for 
multiplication). What happens if we choose different representatives a and (3 for the classes 
[a], and [b],,? We use that [a], = [a], and [b],, = [3], to conclude that 


a=a+kn and @ =b+ (In for some k and | € Z. 
Thus 
a+ B@=at+64+(k4+))n and of = ab + (kb+la)n+ kin? = abt (kb +la+kin)n 


and we conclude that 


[a+ Bln = [a+], and [af], = [ab]n 
as required for the operations to make sense. 
As a matter of fact the system (Z,,+,-) (that is, the set Z,, with its binary operations 
+ and -) shares many (but not all) properties of the more familiar system (Z,+,-). The set 


Zn, contains a zero element [0], and a (multiplicative) identity [1],,.* We illustrate with the 
addition and multiplication tables for Zeg. 

+/0)1)2)3)4]5 0/1)2/3}4]5 

0}}0};1)2)3)4)]5 0).0);0);0;0);0)0 

1}.1/2)3)4/5)0 1/0)1/2)3)4/5 

2|/2/3)4]5/0)1] and] 2)/0/2)4/0)2) 4 

3 |}3}4/5]0}1 42 3}/,0)3)0/3)/0)3 

4/4/5/0/1)2)3 4),0}4)2)0/4) 2 

5 |}5)/0,/1])2)3)4 5/0}5)/4/3)2)1 


In all our tables on congruence arithmetic a denotes |{a],, (with n understood from the 
context). In the above two tables, we have listed the elements of Zg in the first rows and 
first columns. In the first (second) of these table the sum (product) a+ b (ab) appears in 
the intersection of the row indexed by a and the column indexed by b. Notice and explain 
the symmetries in the above tables. The addition tables are rather easy to construct. Some 
more work is required to produce the multiplication tables. We reproduce here the MAPLE 
programs that give in matrix form the multiplication tables for Z,7 and Zj4. We then print 
the resulting matrices in standard format. 


MAPLE SESSION #3. 


k= 17 
> aa := array(1..k,1..k): 
for ito k do for j to k do aali,j] := (i-1) * (j-1) mod k end do end 
do: 
print (aa) ; 


4We will show later in the book that the system (Zn,+,-) forms a commutative ring. When there can 
be no confusion, we will use the symbol Z,, to reprersent this set, the commutative group (Zn, +) or the ring 
(Zn, +,°). 


6. MODULAR ARITHMETIC 39 


**KEND OF PROGRAM*** 


MULTIPLICATION MATRIX FOR Zi 


Oyo 6 Or 0 Se 0b Oe ie ae Os SOS OP. ile. < a) 

0 1 2 3 4 5 6 7 8 9 10 11 12 138 14 15 16 

OD 4&6 8 10) 12-14 de. 1-8 a BD 13: 15 

0 3 6 9 12 1 1 4 7 10 18 16 2 5 8 11 14 

0 4 8 12 16 3 7 11 15 2 610 14 1 5 9 13 

Qe O20. ee * fa 8 A> ls is Lose A Qs Bee de ee Ae 

Ge 6 12", “da, oe TB <2 144 3 9 15 4 10 16 #5 11 

0 7 14 411 1 815 5 12 2 9 16 6 18 #3 10 

0 8 16 7 15 6 14 5 18 4 12 3 11 210 1 9 

O29 TD TO) 22 TL Bde TS se dA. 6. 15. oe. 16. 8 

010 3 138 616 9 212 5 15 8 111 4 14 7 

0 11 5 16 10 415 9 3 14 8 2138 #7 #1 «12 «6 

012 7 2 14 9 4 16 11 6 118 #8 #3 15 «10 5 

013 9 5 1 1410 6 2 15 11 7 #3 16 12 8 4 

0 1411 8 5 2 16 138 10 7 4 1 15 12 9 6 383 

015 13 11 9 7 5 3 1 16 14 12 10 8 6 4 2 

0 16 15 14 13 12 11:10 9 8 7 6 5 4 3 2 #1 

MULTIPLICATION MATRIX FOR Zaoa4 

OOO 0 SO ODO SO I OE 0) OA a SO 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 
2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 
Bo GOs UB WB. Be BE Or BB) OS A ES AB. OT: Oe es HS. AB ab 8 
4 8 12 16 20 0 4 8 12 16 20 0 4 8 12 16 2 0 4 8 12 16 20 
S100 Gb. Be 2 GD 6 <2). ee aT ee. BB AB: AR Ba A 9 aa 
as as cea CU << 2 SD a a 
44: Ot 4 Td. 48° D> Be Th 28 TE TB 1 G23. ee “PS BO. « B- 0 Ay 
8 16 0 8 16 0 8 16 0 8 1 0 8 1 0 8 16 0 8 6 O 8 16 


0 
10 20 6 16 2 = 12 22 8 8 4 14 0 10 20 6 16 2). 12: 322 8 18 4 14 
11 22 9 20 7 #18 5 16 


V 
oooooooceoocecoecece0ce0cecececqe0qe0ceoce 


0 
8 
6 
0 
8 
6 
18 12 6 0 18 12 6 0 18 12 6 0 18 12 6 0 18 12 6 0 18 12 
8 
16 
0 
8 
16 


The above program needs no explanatory remarks. In the output, we have omitted the 
previously used first column and first row. So the (i,7)-entry of the output matrix is the 
standard representative of [(¢ — 1)(j — 1)]n. The first columns and first rows of the output 
matrices are, of course, superfluous. 


40 1. THE INTEGERS 


EXAMPLE 1.53. Let us show that 11|(10! + 1) or equivalently that 10!+ 1=0 mod 11. 
We do not do the brute force calculation, but reduce modulo 11. Start with 


10! = 10-9-8-7-6-5(4-3-2) = (10-9)8-7-6(5-2) = 2(8-7)6-10 = 2-1-6-10 = (2-6)10=10 mod 11. 


Hence 
10!+1=10+1=0 mod iI. 


DEFINITION 1.54. Let n € Zs; and a € Z. We say that [a], is invertible in Z,, or has an 
inverse (modulo n) if there exists a b € Z such that [a],,[b], = [1]n. The invertible elements 
in Z, are also called units. We say that a non-zero congruence class [a], is a zero divisor 
(modulo n) if there exists an integer b such that |b], 4 [0], but [a],[b], = [O]n. 


THEOREM 1.55. Letn € Zs; anda € Z. Then |a|n has an inverse modulo n if and only 
if (a,n) =1. If in factr and s € Z satisfy ar + sn = 1, then |r], is an inverse of [a|n. 


PROOF. Suppose that [a] is invertible’? with inverse [k] (k € Z). Then ak = 1 mod n; 
that is, n|(ak — 1). Therefore there exists a t € Z such that nt = ak — 1. This implies that 
(a,n) = 1. Conversely, if (a,n) = 1, then there exists integers r and s such that ar+ sn = 1. 
Therefore n|(1— ar) and ar =1 mod n; the last equation says [a]n[r]n = [In- 


PROPOSITION 1.56. Let n € Zs; anda € Z. If lal, is invertible modulo n, then its 
inverse [b|, is unique and is hence written as |a\7" 


ProoF. If for c € Z, [c],, is also an inverse of [a],,, then [a]([b] — [c]) = [0]. Thus nja(b—c) 
and since (a,n) = 1, n|(b—c). Thus [b] = [c]. 


EXAMPLE 1.57. Since 1 = —91-507+118-391, [391]z2 = [118]so7 (and [116]3, = [300]391). 


EXAMPLE 1.58. Since (215,795) = 5, 215 does not have an inverse modulo 795 and 795 
does not have an inverse modulo 215. Note that [795]oi5 = [150]oi5. 


EXAMPLE 1.59. It is rather obvious that (73,23) = 1. So that both [73]53 and [23]73 
exist. To find them, we proceed to express (73,23) as a linear combination of 73 and 23 
using the GCD algorithm: 


Tio: | 1 23) AN L = Zilia 6. -=10° || 4 
Oot ees 2s: ak | 298 S16. |B) | ke |e SIG a |? 
Thus 6-73 — 19-23 = 1, [23] 73 = [—19]73 = [54] 23 and [73] 23 = [6] 23. 


CoROLLARY 1.60. Let n € Zs, anda, b andce Z. If (n,c) =1 and ac = be mod n, 
then a=b mod n. 


PROOF. We rewrite the congruence ac = be mod nas [a},,[c]n = [B|n[c]n. Since n and c 
are relatively prime, [c]>! exists and the lemma follows by multiplying each side of the last 
equality by [c]7". 


COROLLARY 1.61. Let n € Zso. Then each non-zero |a|, is either invertible or a zero 
divisor, but not both. 


The subscript n is dropped since it fixed throughout the argument. When clear from the context we 
will also drop the [| ] from the notation. 


6. MODULAR ARITHMETIC Al 


ProoF. Let a € Z and assume that |a],, is non-zero and not invertible. Thus d = (n,a) > 
1. It follows that a = kd and n = Id for some positive integers k and 1 > 1. Hence al = kid 
is divisible by n. Thus |a]n[JJn = [O)n; that is, [a], is a zero divisor. oO 


COROLLARY 1.62. Let p € Zyo be a prime. Then every non-zero element in Z, is 
invertible. 


DEFINITION 1.63. For each n € Zso, we let Z* be the set of invertible congruence classes 
in Z,,. 


THEOREM 1.64. Let n € Zy,. Then Z* is closed under multiplication; that is, if [a] and 
[b] € Z*, then so does |a][b}. 


PrRooF. Let p be a prime. Since (n,a) = 1, either p does not divide n or p does not 
divide a. Similarly (either p does not divide n) or it does not divide b. Hence either p does 
not divide n or p does not divide ab. It must be the case that (n, ab) = 1. 


Note that Zz consists of {{1], [3], [5], [7]} and thus |Z] = 4. It is easy to construct by 
brute force the multiplication table for Z§. It is, of course, a subset of the multiplication table 
for Zg (that the reader should construct and compare to the one below) and is described by 


NIotw He 
oN wll wo 
WR AT of ot 
Rw on)! v 


“JOT WF 


For larger n, we may use MAPLE to construct the multiplication tables. The construction 
of the first of these tables for the prime 17 offers no challenges. It can readily be modified 
to produce the multiplication table for Z* for any prime n by merely changing the first line 
of the program. 

MAPLE SESSION #4. 


n:=17 


S SSK RPK 
f:=@-u 


Sea. = array (le (alot it)? 


> for ito (n-1) do for j to (n-1) do aali,j] := f(i) * f(j) mod n end 
do end do: 


> print (aa) ; 


42 1. THE INTEGERS 


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
2 4 6 8 10 12 1416 1 3 5 7 9 11 138 #15 
3 6 9 12 15 1 4 7 10 13 16 2 5 8 11 14 
4 8 12 16 3 7 11 15 2 610 14 1 5 9 18 
5 10 15 3 8 13 1 611 16 4 9 14 2 7 12 
612 1 7 138 2 8 14 3 9 15 4 10 16 5 11 
(d4 -41b. 2 STs o 2 2. 9 16 6 18. 8 10 
8 16 7 15 6 14 518 4 12 3 11 2 10 1 9 
9 110 211 3 12 4 138 5 14 1 7 16 8 
10. :3- 18. 6-16 9 2 120 Ja Toe. 1 de 4A OF 
11 5 16 10 1: Oy 23. 14S 28 2 dae a dL 2-36 
12 7 214 9 4 16 11 6 «2118 «8 #3 «15 «10 ~=«#+5 
133 9 5 1 1410 6 215 11 7 #3 «16 «12 «8 4 
14411 8 5 2 16 13 10 7 4 1 15 12 9 6 383 
15 138 11 9 7 5 3 1 16 14 12 10 8 6 4 2 
16: 15. 14° 28° 12 D1. 10. 8) 8. Ok 1B ee OA SS 


***END OF PROGRAM*** 


Compare the last output with that of the of the previous MAPLE session for the prime 
17. What is the difference? The program for non-primes is more interesting. 


MAPLE SESSION #5. 


> n := 24: 
> zstarn:=select(x->if (gcd(x,n)=1) then true; else false; end if, 
{seq(i,i=1..n)}); 


wstarni= 41; 9; ¢, dljel3s Tty 19.23} 


> with(numtheory): 


Warning, the protected name order has been redefined and unprotected 
=. phan 

8 
> aa := array(1..phi(n),1..phi(n)): 


> for ito (phi(m)) do for j to (phi(n)) do aali,j] := zstarn[i] * 
zstarn[j] mod n end do end do; 


> print(aa) ; 


Thea Ls og? 2 ~ 28> 28 
ide aD ake 8 328 19 


6. MODULAR ARITHMETIC 43 


**KEND OF PROGRAM*** 


The second command of the above program produces the list of y(n)'® positive integers 
that are < n and relatively prime to n. The entry phi(n) was included only as very weak 
consistency check on our program. By changing 24 to 72 in the first line of the program, we 
obtain the multiplication table for Z72. It results in the following table. 


[ 1 5 7 11 13 #17 «19 23 25 29 31 35 387 41 43 #47 #49 #53 55 59 61 65 67 71 ] 

5 25 35 55 65 13 23 43° 53 1 11 31 41 61 71 #19 29 49 59 7 17 #37 #47 67 

7 35 49 5 19 47 61 17 #31 «59 1 29 43 71 #13 41 #55 11 25 53 67 23 37 «65 
11 55 5 49 71 43 65 37 59 31 53 25 47 19 41 13° 35 7 29 1 23 67 #17 «#61 
13° 65 19 71 25 5 31 11 37 #+%17 «+43 23 49 29 55 35 61 41 67 47 1 53 7 59 
17 13 47 #48 ~~«5 1 35 31 65 61 23 #19 53 49 11 7 41 #37 #71 67 29 25 59 55 
19 23 61 65 31 35 i 5 43 47 13 17 #55 59 25 29 67 71 37 41 7 11 49 53 
23 43 #17 #37 #11 31 5 25 71 19 65 13 59 7 53 1 47 67 #41 #61 35 55 29 49 
25 53 31 59 37 65 43 71 49 #5 55 11 61 17 67 23 1 29 7 35 13 41 #19 47 


EXERCISES 


(1) Let n and m be positive integers. Consider the set Z, x Z,, of ordered pairs (a, b) 
with a € Z, and b € Z,,. Introduce a multiplication on such pairs by defining 
(a, b)(c, d) = (ac, bd). 

e Show this multiplication is well defined. 

e Construct this multiplication for Z3; x Z; and compare it to the multiplication 
table for Zj5. 

e Construct the multiplication table for Z, x Zg and compare it to the multipli- 
cation table for Zaa. 

(2) Fix a positive integer n. The multiplication table for Z* that we have been using 
is a y(n) x y(n) matrix M constructed as follows. Let 71, @2,...,%y(n) be increasing 
list of integers 7 with 1 <j <n and (j,n) = 1. The (7, 7)-entry of the matrix M is 
the standard representative of [7;x,;|. Show that M is a symmetric matrix. What 
property of multiplication of congruence classes does the symmetry of M reflect? 

(3) In this exercise we study simple divisiblity tests for positive integes NV. 

e Show that N is divisble by 3 if and only if the sum of the digits in N is. 
e Devise and establish similar (or simpler) divisibilty tests for 4, 5, 6 and 7. 


16The Euler y-function is introduced in Section 8. 


44 1. THE INTEGERS 


7. Solutions of linear congruences 
We are interested in solving linear congruences; that is, equations of the form 
(7) ax =b mod n equivalently [a],,X = [D|n, 


where n € Zso, a and b € Z are fixed and we are looking for integers x that satisfy (in 
the equivalent formulation we are looking for X, equivalence classes of integers modulo n) 
equation (7). If a = 0 or a = 1, we already know the answer, so we may assume from now on 
that 1 < a< n. However, we work without this restriction. It pays for us to concentrate on 
the formulation in terms of equivalence classes. There are some immediate differences from 
ordinary equations: 


e The equation [2]3X = [1]; has a unique solution X = [2]3. 
e The equation [2]4X = [1], has no solutions. 
e The equation [2],4X = [0], has two solutions: X = [0], and [2},. 


This is more or less the general picture as seen in 


THEOREM 1.65. Let n € Zyo, a, b € Z. Then (7) has solutions if and only if d = (a,n)|b. 
If d|b, there are exactly d congruence classes of solutions modulo n and all these solutions 
are congruent modulo ©. 


ProoF. For c € Z, let |c], be a solution to (7). Then ac = 6 mod n; that is, [a]n[cl, = 
[b], or ac— nk = b for some k € Z. Thus dlb. Conversely, assume that d|b. We divide (7) by 
d and obtain 
b 


or equivalently =| oS H ‘ 
ad n 


b 
— mod 7 a]. 


—_ n 


where in the last equation X represents a congruence class modulo 4. It is easy to see that 
as equations for unknown integers x, (7) and (8) are equivalent; an x € Z that solves one 
also solves the other. The same is true for equivalence classes X once one understands that 
equivalence class X modulo 4 that solves (8) corresponds to d equivalence classes modulo n 
that solve (7). To see this, pick any x € X and let Xo = [z],. Then Xo C X and we define 


for any integer a, 
X,=Xota={y tary € Xo} = [e+ a]n. 
It is easily seen that 
X= XpUX2UXe2 VU... U Xa_as. 


a 


Since (4,4) = 1, (8) has a unique solution X = Ey [4] »- 
d d 


REMARK 1.66. It is useful to consider various special cases. 


e n =1: In this case every x € Z is a solution and X = Z. 

e nia or equivalently [a], = [0],: In this case, once again, every x € Z is a solution. 
In terms of equivalence classes, every class is a solution (there are n such solutions). 

e [a|, is a zero divisor (in particular [a], 4 [0],): In this case, we consider two subcases: 
(a) [4] _= [0]2: [0] is the only solution of (8) and thus (7) has d solutions 


7. SOLUTIONS OF LINEAR CONGRUENCES 45 


all but the first of these are zero divisors, and 
(b) [3]. 4 [O]2: Let [$] », with a € Z such that 0 < a < d be the unique solution 
d d 


d 
of (8), then (7) has d solutions 
n n 
[a]n, la + ae iis 3 la +(d— 1) ss 


all are zero divisors. 


COROLLARY 1.67. Equation (7) has a unique equivalence class of solutions if and only 
if d = (a,n) = 1; in particular, whenever n is a prime and n does not divide a. 


EXAMPLE 1.68. To solve 62 = 2 mod 17 we note that since 17 is a prime and 6 is not 
a multiple of 17, the unique solution is X = [6];7[2]17 = [3]17[2]17 = [6]i7- 


EXAMPLE 1.69. To solve a more substantial looking problem like 43272 = 12 mod 546 
we need to do some more work. It is easily seen that 432 = 2433 and 546 = 2-3-7-13 
and thus 6 = (432,546). Hence [72]9:X = [2]9: has a unique solution. To find it we need to 
calculate [72]5;. This is, perhaps, best done by the GCD algorithm: 


1091 Le ai) G1) ee 1, Se 9 A =5 |l-A 
Oe) Wo |) ee age “ay, Ne Be Aly AG ake |W eee ae. WTS 


Seip dy. 5.4 1” 2945 | a 
-15 19|3}] 4 | -15 19 | 3 ]° 
We read of that 19-91 — 24-72 = 1; which tells us that [72]5; = [—24]o1 = [67]o1. So that 
X = [43]o,. In terms of congruence classes modulo 546, we have 6 solution; namely, 


[43]546, [134]546, [225]546, [316]546, [407] 546, [498] 546- 


THEOREM 1.70 (The Chinese remainder theorem, CRT). Let r € Zyo and let m1, mz, ...,M, 
be relatively prime positive integers. Let a,,d2,...,a, be any set of integers. Then the system 
of congruences 

L=a; modm, i=1,...,r 
has a unique solution modulo M =m 4...M,r. 


PROOF. The theorem is obviously true for r = 1. So assume that r > 1. Observe however 
that the argument that follows holds (with appropriate understanding of symbols) for the 
case r= 1. Let M, = a = M1...MNp_1MeMNp41-..M,, Where mM, indicates that the m; term is 
missing from the product. Then since (m;,m,) = 1 if i 4 7, we conclude that (m;, Mj) = 1. 
Thus there exists y, € Z such that [yg],,, = [Mili We set 


x= a, My, + agMoy. +... + aM, yy. 
Then 
[2]: = [Arle Malm lYa}me + [@2]meLMo]m;[Yolms + + + [Gr] Mr]milrlme- 
But [Mj]m, = 0 if 7 #7. Hence 
[Imi = [AilmiMalms[Yilms = [@ilm.- 
If y is another solution the system of congruences, then for each 7, m;|(y — x) hence lcm 
(m1, Ma, ...,Mp) = M|(y— 2). 


46 1. THE INTEGERS 


EXAMPLE 1.71. To solve simultaneously (in Z) the three equations 


x=2 mod 7, 

x=0 mod9 
and 

2x =6 mod 8, 


we first solve the last equation to obtain x = 3 or 7 mod 8: which is equivalent to x = 3 
mod 4. We can replace our system of equation by the equivalent system 


x=2 mod 7, 

x=0 mod9 
and 

x=3 mod4, 


whose solution is 
x =2-36-1+04+3-63-3=1385 mod 252. 


It is clear that computers should be of use in applications of CRT. We illustrate what 
can and cannot be done in MAPLE. Our first example is the solution of the system 


x=12 mod 18, 


x=13 mod 14, 
x=14 mod 23 


and 
x=15 mod 25. 


It would be quite a time consuming task to rely solely on calculators to solve this problem. 
We approach this problem as an algorithm involving a few steps. 

1. We first verify that the hypothesis of CRT are satisfied (that is, that the moduli are 
relatively prime). 

2. We know that the solution is a congruence class a mod M. We compute M as the 
product of the moduli. 

3. We determine next the smallest positive a. It is in the intersection of several sets that 
are easily described; the k‘” set is a subset of the set of solutions to the k*” equation. 


MAPLE SESSION #6. 


> gcd(13,14); 

1 
>. Bed( 13; 93); 

I 
> gcd(13,25); 

1 
> gcd(14,23); 

1 


> gcd(14,25); 


7. SOLUTIONS OF LINEAR CONGRUENCES AT 


> ged(93,25); 


> 13 * 14 * 23%. 25; 
104650 
> sett := {seq( 12 + 13*i1,i=0..200)}: set2 := {seq( 13 + 
14*i,i=0..200)}: set3 := {seq( 14 + 23*i,i=0..200)}: set4 := {seq( 
15 + 25*i1,i=0..200)}: 
> seti intersect set2 intersect set3 intersect set4; 


{} 

> set1 := {seq( 12 + 13*1,i=0..20000)}: set2 := {seq( 13 + 

14*i,i=0..20000)}: set3 := {seq( 14 + 23*1,i=0..20000)}: set4 := 

{seq( 15 + 25*i,i=0..20000)}: 
> seti intersect set2 intersect set3 intersect set4; 

{34215, 138865, 243515} 
> chrem((12,13,14,15] , [13,.14,23,25]); 
34215 


***END OF PROGRAM*** 


e The first 6 lines of the program are just a check on the hypothesis for the Chinese 
remainder theorem. They can be combined into a single command as is done in the 
next program. 

e The 8" line of the program defines the sets whose intersection should yield a solution. 
However we have truncated the sets too quickly as is shown by the next line. 

e The next to the last section of the program shows that the simultaneous solutions 
to the set of four equations is the congruence class 


34215 mod 104650 


(we use the single calculation 104650 = 138865 — 34215). 
e The last section of the program shows the single MAPLE command needed to get 
the smallest positive solution. 


A slightly different picture emerges in the MAPLE solution to a similar set of congruences 
involving bigger numbers 


x=12 mod 993, 
x=13 mod 994, 
x=14 mod 1003 


and 
x=15 mod 1007. 


MAPLE SESSION #7. 


48 1. THE INTEGERS 


> [gecd(993,994) , 
> gcd(993,1003), 
> gcd(993,1007), 
> gcd(994,1003) , 
> gcd(994, 1007) , gcd(1003, 1007)]; 
[le Se dks de | 
> 993 * 994 * 1003 * 1007; 
996933147882 


> set1 := {seq( 12 + 993*i,i=0..200)}: set2 := {seq( 13 + 
994*i,i=0..200)}: set3 := {seq( 14 + 1003*i1,i=0..200)}: set4 := 
{seq( 15 + 1007*i,i=0..200)}: 

> seti intersect set2 intersect set3 intersect set4; 


{} 
> set1 := {seq( 12 + 993*i,i=0..20000)}: set2 := {seq( 13 + 
994*i ,i=0..20000)}: set3 := {seq( 14 + 1003*i,i=0..20000)}: set4 := 
{seq( 15 + 1007*i,i=0..20000)}: 


> seti intersect set2 intersect set3 intersect set4; 
8 
> sett := {seq( 12 + 993*i,i=0..2000000)}: set2 := {seq( 13 + 
994*i,i=0..2000000)}: set3 := {seq( 14 + 1003*i,i=0..2000000) }: 
set4 := {seq( 15 + 1007*i,i=0..2000000)}: 


Warning, computation interrupted 
> chrem([12,13,14,15] , [993,994,1003,1007]) ; 
630257901363 


***END OF PROGRAM*** 


Note that the naive calculations that look for a solution as an intersection of four sets 
runs into time problems (hence the interruption). However, the chren MAPLE command 
is powerful enough to perform its calculation in a very short period of time. It follows that 
MAPLE uses a more sophisticated algorithm in solving simultaneous congruence equations. 


REMARK 1.72. The hypothesis that (m;,m;) = 1 for i # j cannot be replaced by the 
weaker hypothesis (m1, m2, ...,™,) = 1 as is easily shown by the example 


x=0O mod 2, 

x=0 mod3 
and 

ct=1 mod4 


that has no solutions; since the first of theses three equations says that 7 must be even and 
the last that it must be odd. 


We end this section with a brief introduction to non-linear congruences. 


7. SOLUTIONS OF LINEAR CONGRUENCES 49 


EXAMPLE 1.73. We start with the equation 
z*?—1=0 modn. 
It is equivalent to the equation 
(c—-1)(c+1)=0 mod n. 


So if x is a solution, then x — 1 and x + 1 are either 0 or zero divisors. Thus if n is prime, 
the only solutions are x = 1 and x = n—1 (modulo n, of course). For composite n, x = 1, 
x =n-—T1are still solutions. But there may be others (extra solutions) as well. They are to 
be found among the x for which both x +1 are zero divisors. The zero divisors for n = 6 
are 2, 3 and 4. Thus only x = 3 could be an extra solution, but it is not. For n = 8, the 
solutions are 1, 3, 5 and 7. The zero divisors in this case are 2, 4 and 6. Each extreme pair 
of these accounts for one solution. For n = 25, only 5 is a zero divisor. Hence, only 1 and 
24 are solutions. 


EXAMPLE 1.74. We continue with the equation 
c?+1=0 modn 


and try to find its roots. Once again the answer depends on n. For n = 2, | is a solution. 
There are no solutions for n = 3 and 4. For n = 5, x = 2 and 3 are the solutions. Obviously 
this is a place where a symbolic computation program will help. Using MAPLE (a worksheet 
is included below), we see that for n = 125, x = 57 and 68 are solutions leading us to the 
factorization 

x? +1 (¢-—57)(x—68) mod 125. 
Perhaps more surprising is the case n = 65 where 8, 18, 47 and 57 are solutions giving us 
two factorizations 

x? +1= (2 —8)(x—57) mod 65 

and 

g* + 1 = (x2 —18)(z—47) mod 65. 
A self-explanatory MAPLE program (reproduced below) facilitates the computations in this 
example. 


MAPLE SESSION #8. 


> msolve({x72 + 1 = 0}, 2); 
{x= 1} 

 msolvet({x*2 4-1-= 0)» 3) 
> msolve({x72 + 1 = O}, 4); 
> msolve({x*2 + 1 = 0}, 5); 

= 2) fe = 3} 
> msolve({x*2 + 1 = O}, 125); 

{o= 57h, {a= 68h 
> msolve({x*2 + 1 = O}, 65); 


1e= 8); 4¢ = 18), {2 = 47) fe = or} 
> msolvet 72+ i-=0},, 13)3 


50 1. THE INTEGERS 


{x = 5}, {x = 8} 


> msolve({x*2 + 1 = 0}, 17) 
feast de S13 
> msolve({x*2 + 1 = O}, 19); 


***END OF PROGRAM*** 


EXAMPLE 1.75. Our last example of this section is a brief discussion of the polynomial 
(9) ge to +2. 
We investigate whether it can have any integral roots. One can learn from more advanced 
algebra books (or courses) that the only possible integral roots of the polynomial are +1 and 
+2 and that none of these are in fact roots to conclude that the equation has no integer roots 
(see Section 10). Without more advanced results we can come to the same conclusion using 
modular arithmetic. If (9) had an integer root, then for every n € Zs, reducing modulo n, 
we would certainly have a root in Z,. Let f(r) =r? —r? +r +2. Then f(0) = 2, f(1) = 
f(2) = 8, f(8) = 23 and f(4) = 54. So the equation f(r) = 0 mod 5 has no solutions. 
Hence (9) cannot have any integral solutions. 


8. Euler 


Let us fix a positive integer n. For a € Z, we have determined conditions that guarantee 
the existence of [a];! and algorithms for computing it. Two results that give formulae for 
this inverse (due to Fermat for n prime and to Euler for the general case) turn out to have 
good applications to cryptography. 


DEFINITION 1.76. Let n € Zso and a € Z. The integer a (or the equivalence class [{a],,) 
has finite multiplicative order modulo n if there exists a k € Zso such that 


lal = (la"Jn) = [Hn 
If a has finite multiplicative order modulo n, then the smallest k as above is the (multiplica- 
tive) order of a (and of {a],,) modulo n; in symbols ord,,a (ord [a],) or ord a (ord |a]) when 
the n is clear from the context. 


REMARK 1.77. Only [1], has order 1. 


THEOREM 1.78. Letn € Zy9 anda € Z. The following conditions are equivalent: 
The integer a has finite multiplicative order modulo n. 

(a,n) =1, 

[a In invertible. 

[a]n A [0], and [a], is not a zero divisor. 


""_ ~\ 


(a 
(b 
(c 
(d 


SS 


PROOF. The equivalence of conditions (b), (c) and (d) has already been established. If 


a has finite multiplicative order, then [a]* = [a],,[a]k~! = [1], and so [a], is invertible. Thus 


(a) implies (c). To establish the converse (that (c) implies (a)), consider the list of n + 1 
elements 
[a]n, (ales +> fala” 


8. EULER 51 


in Z,. Since |Z,,| =n, two elements in the list must be the same. Thus 


[al* = [a]’, forsome 1<k<t<n+1. 


n 


-* and conclude 


If [a], is invertible, we can multiply both sides of the last equation by |a] 
that. [I}.= lal. 


n 


THEOREM 1.79. Suppose that a € Z has order k modulo n. Then 
a” =a’ modn if and only ifr=s mod k. 


PRooF. The fact that a has order k modulo n means that a* = 1+ nw for some w € Z. 
Ifr=s mod k, then r = s+tk for some t € Z. Then 


a’ = ao — aSa® = a8(1+nw)' =a* mod n. 


Conversely, if a” = a* mod n, we may without loss of generality assume that r < s. Since 
(a,n) = 1, {a];! exists and we conclude that 1 = a*~" mod n. The division algorithm tells 
us that s—r = qk-+u for some gq and u € ZwithO <u<k. Therefore 1 = a*-" =a" mod n. 
Since k is the smallest positive integer with a’ = 1 mod n, we conclude that u = 0. 


THEOREM 1.80 (Fermat’s little theorem). Let p be a prime and suppose that a € Z is 
not a multiple of p. Then a?-!=1 mod p. Hence for alla € Z, a? =a mod p. 


PROOF. The group!” Z*, has p — 1 elements: 


[1], [2], aa) [p = 1]p. 
For [a], € Z5, denote by |a],Z> the set of multiples of [a], in Z>: 
alpZS = {[alp[blps Bly € ZS} = {lalp[ lp. [alpl2}ps--[alpl — Up} © Zt 
We observe next that |[a],Z*| = p—1 because no two elements of Z* are equal. (If [a],[b]p = 
[a],[c]p, then since [a], is invertible, [b], = [c]p.) Hence 
[Up l2}p---LP — Up = lalp4plalp[2)p---[alpl — Up = (ale “Hp 2Ip---fp — 1p 
and it follows (by cancellation) that [a]?~' = [1],. This establishes the first part of the 


theorem; also that a? = a mod p whenever p does not divide a. But this last assertion is 
trivial for multiples of p. L 


COROLLARY 1.81. Let p be a prime and suppose that a € Z is not divisible by p. Then 
the order of a mod p divides (p — 1). 


Proor. The theorem shows that [a]? = [l],. If k is the order of a mod p, then 


[a]* = [1], and Theorem 1.79 tells us that p—1=k mod k. 


EXAMPLE 1.82. For all primes p, the order of [1], is, of course, 1 and the order of [p — 1], 
is 2 provided p is odd. All possibilities allowed by the theorem do occur. In Z%, for example, 
the orders of the units [1]z, [2]7, [3]7, [4]7, [5]7 and [6]7 are 1, 3, 6, 3, 6 and 2, respectively. 


We now come to one of the most important functions in number theory, the Euler y- 
function. 


DEFINITION 1.83. For each n € Zyo, we let y(n) = |Z*|. Thus y(n) is the number of 
positive integers <n that are relatively prime to n. (Note that y(1) = 1.) 


"Language to be justified later. 


52 1. THE INTEGERS 


The reader should check that the entries in the following table are correct. 


nm | 1/2/3[4/5|6|7/8|9]10| 11] 12/13/14] 
g(n) | 1[1/2[2]4/2/6 [4/6 4]io] 4/12] 6] 


It is perhaps surprising that there is an easy formula for y(n). We now begin the journey 
toward (1.86). 


PROPOSITION 1.84. If p is a prime and n € Zyso, then 
y(p") =p — pr. 
ProoF. The only integers between 1 and p” (including both ends) that are not relatively 
prime to p” are the multiples of p, namely 


DO pp P. 


There are exactly p"~! such integers. 


THEOREM 1.85. If a and b are relatively prime positive integers, then 
p(ab) = p(a)y(b). 


PROOF. Since y(1) = 1, there is nothing to prove if either a or b = 1. So assume that 
both are in Z,,. The theorem says that the number of elements in Z*,, |Z*,|, is the product 
of |Z*| and |Z]. We construct a one-to-one surjective map'® 


f:Z, x Z— Zi. 
A point in the direct product Z% x Zj is a pair ([r]a, [s],), where r and s are positive integers 
relatively prime to a and ), respectively. By the Chinese remainder theorem, there is a unique 


congruence class [t],, (the notation is meant to imply that we are choosing a representative 
t € Z of this class) such that 


t=r modaandt=s modb. 

Since t = r+ka for some k € Z and (r,a) = 1, we conclude that (t,a) = 1. Similarly 
(t,b) =1. Thus (t,ab) = 1; that is, [t]a, € Z%,. Hence we define f by 

f ([rla, [slo) = [las 
We need to show that the map is f is an isomorphism (one-to-one and onto). We proceed 
indirectly by constructing a map 

g:Z,, > Z, x Z, 
that is an inverse of f. Let [t],, € Z*,. Choose the unique r and s € Z such that 0 <r <a, 
0<s<b, [tla = [rJa and [t], = [s],. We define 


9 ([tlav) = ([rJas [s]o) - 
We must show that |r], € Z%*; that is, that (a,r) = 1. This is shown as follows. Since 
(ab,t) = 1, we must also have that (a,t) = 1. Since t =r mod a it follows from (a,t) = 1 
that (a,r) = 1. Similarly we see that |[s], € Z. It is clear from the uniqueness part of the 
Chinese remainder theorem that fo g is the identity self map of Z*,. Hence g is injective 
and f is surjective. The fact that g is one-to-one tells us that |Z*,| > |Z*||Z;|. The fact that 
f is onto tells us that |Z*,| < |Z%||Z;|. The last two inequalities imply equality. 


18s we shall see later, an isomorphism between groups. 


8. EULER 53 


THEOREM 1.86. Let n € Zs, have prime factorization n = [[j_, p;’ with the p; distinct 
primes and the exponents n; > 0. Then 


PrRooF. The first formula is proven by induction on r. The base case r = 1 is the 
content of Proposition 1.84. So assume that we have the formula for r = k > 1 and proceed 
to establish it for r =k+1. Write 


k+l k 
_ ne Nj N41 
n= |[p* = [[o" Prat: 
i i=l 


Since a iD. Pet} = 1, we conclude by the induction hypothesis and the last theorem 
that 


k+l k 
Ni n Ny Ni n n. 1 
y (Tl )- o( (IL ae) = = (11 (pt —p; ») (pe — pi ) = 
i=1 i=1 
k+1 
[|] @ - 2). 
i=1 
This finishes the proof of the first equality; the second equality is a consequence of an easy 
algebraic manipulation. Oj 


Our next result is a generalization of Fermat’s little theorem. 


THEOREM 1.87 (Euler). Let n € Zs» and suppose that a € Z is relatively prime to n. 
Then a? =1 mod n. 


ProoF. The argument here is a generalization of the one used to establish Fermat’s 
little theorem. The group Z* has y(n) elements. As before, [a], € Z* and [a],Z*, denotes 
the set of multiples of [a], in Z*: [a]j,Z* = {[a]n[bln; [bln € Ze}. As in the earlier proof 
\[@],Z*| = y(n) because no two elements of Z* are equal. Hence 


[[¢= [[ =e [Le 


beZ* beZ* beZ* 


and Euler’s theorem follows by cancellation. 


COROLLARY 1.88. Let n € Zs and suppose that a € Z is relatively prime to n. Then 
the order of a mod n divides y(n). 


EXAMPLE 1.89. We determine the congruence class mod 14 of 3!°. We can, of course 
use the MAPLE command > 3°{19} \mod 14; and receive 3 as the answer. But this 
problem can be solved and was solved before and without MAPLE. The order of 3 modulo 
14 divides y(14) = 6. Hence 38 =1 mod 14 and thus 3!9 =3 mod 14. 


EXAMPLE 1.90. We determine the last two digits of 77°. Again MAPLE readily supplies 
the answer; although it (version 7) has trouble obtaining the last two digits of 77°°?'. The 
last two digits of an integer are determined by its congruence class mod 100. The order 
of 7 mod 100 divides y(100) = y(275”) = 2-20 = 40. Hence 74°" = 1 mod 100 for every 


54 1. THE INTEGERS 


positive integer r. Thus 729°? = 79940+2 = 49 mod 100 and 779°! = 1 mod 100 because 
40|2962!. Can we determine this way the last two digits of 679? Not exactly as before. 
62962 — 9296232962 As before 379°? =9 mod 100. But at this point we do not have the tools 
to conclude (without a brute force calculation using MAPLE, for example) that 279° = 4 
mod 100. 


DEFINITION 1.91. Let a € Z with (a,n) = 1. The (multiplicative) inverse of [a], € Z* is 
the unique [b], € Z* with [a][b] = [ab] = [1]. We have seen that if [a] has order k, then its 
inverse exists and [b] = [a]*~-1 = [a]P?™-?. 

REMARK 1.92. We have been studying three systems. The language will be established 
in subsequent chapters: 

e (Zn, +,:) is a commutative ring. 
e (Z,,,+) is a cyclic abelian group of order n with generator [1]. 
e (Z*,-) is an abelian group of order y(n); usually not cyclic. 


EXERCISES 

(1) Find the orders of 

(a) 2 mod 31, 

(b) 3 mod 75 and 

(c) 4 mod 27. 
(2) Show that a € Z has order k modulo n if and only if k € Zyo is the smallest integer 

such that a* — 1 = nw for some w € Z. 
(3) Prove that a and a® have the same last digit for all a € Zyo. 
(4) Let a and b € Z. Prove that 


‘ Gre 5) (a) 7 (sate) : 


(5) For what positive integers n is y(n) < 8? 
(6) In this exercise we study an additive version of the (multiplicative) order function. 
e Verify the entries in 


Order of elements of Z3, 


a LTS?) 41 [8 fi? 923 | 
orda [1[2/2]/ 2/ 2/2] 2] 2] 


e Let n € Zyo and let a € Z,. In analogy with the definition of the multplivative 
order of a € Z*, define the additive order of a € Zy. 
e Your definition should produce the following: 


Order of elements of Zo4 


a O; 1} 2}3/4] 5/6] 7)8}] 9/10} 11 
ord a |} 1 | 24} 12/8)6| 24) 4} 24) 3) 12) 12 | 24 | 


a 12/13/14] 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 23 | 
orda || 2/24)12| 8] 3] 24] 4/24) 12 8 | 12 | 24 | 


9. PUBLIC KEY CRYPTOGRAPHY 55 


9. Public key cryptography 


Let us assume that we have a large number, N € Zso, of people who want to communicate 
(say on the web) in a more or less secure manner. Electronically, we can only transmit 
numbers. So the first task is to translate the letters of our alphabet to numbers. We can 
certainly choose a subset of the integers between 1 and 100 to accomplish this. Say we may 
set up the following correspondence 


A268 BOR eas. Peis Bao oY Har Son. 
blank = 93, ,= 98, . = 99, 


known as a code book or an encryption algorithm. The message 170377 would hence be read 
as DAY. We share the code book between our set of N correspondents. We now have no 
problem sending messages to each other. If our code is secure (more on this later), only the 
N communicators: 


Il,, Io, ..., Ty 
can code (changing the letters to numbers) and decode (changing the numbers back to the 
“correct” letters) messages. There is a problem with this scheme. Say that two communica- 
tors II, and Ip want to keep the information they exchange from all the other communicators 
II;, 7 > 2. This can certainly be accomplished if each pair of communicators had their own 
code book. But this would require a huge number, . = a, of code books. 

We consider a way to cut down the number of code books. Say that II; wants to be 
able to receive messages from each II;, 7 > 1, in such a way that if 7 Ak, 7 and k > 1, 
then neither II; nor II, can decode the other’s message. This can be accomplished through 
what is known as a public key code. In this system II, publishes (for everyone to know) 
an encryption algorithm that only s/he can decode. Sounds hard, but it is really easy in 
theory. We describe the RSA system, one of several that accomplishes this, named after its 
inventors: Rivest, Shamir and Adleman. 

We (II) start(s) by selecting two very large distinct primes p and q and forming their 
product n = pq which is the base for the encryption algorithm. We know that 


y(n) = (p- 1)(q- 1); 
but a knowledge of n which we publish is insufficient to determine y(n) since in practice it is 
very difficult to factor large integers'®. Next we choose a number a relatively prime to y(n), 
the exponent for the encryption algorithm which we also publish. Assign a positive number 
(consisting, for convenience, of a fixed number of digits) to each letter in the alphabet (the 
alphabet usually includes, the ordinary (English) letters, the digits 0 through 9, punctuation 
marks, and other special symbols) to form a dictionary — also part of the public knowledge. 
A message to be transmitted now consists of a large number of positive integers which form 
when written sequentially a digited message. Break the digited message into blocks 6 of 
length less than the number of digits”? in p and in g, but larger than the number of symbols 
in two letters of the alphabet (so that no block contains only zeros). The block b is a non-zero 
integer. By construction 1 <b < pand1<6<4q. In particular, b cannot divide p nor q and 
thus (b,n) = 1. We encode the block b, by computing the standard representative m of b* 
mod n. The encoding can be done by anyone who knows the dictionary (the construction of 


'9This is the key to the method. 
?0Remember than in practice p, q and b are very large. 


56 1. THE INTEGERS 


b), the base (n) and the exponent (a). We (II,) need (needs) to recover 6 from m. By the 
construction we outlined, (b,n) = 1 and hence [6], is a unit in Z,, (thus in Z*). So is [mJn, a 
power of [b],. Raising [m],, to the x” power is the same as raising [b], to the ax power. An 
appropriate power, for example, 1+ ay(n) or 1+ ord,a (with a and @ arbitrary integers), 
of [b],, yields back [0], 

Now II, kept p and q (equivalently y(n)) secret. S/he will use this information to choose 
the appropriate x. We know that 1 = (a, y(n)); thus 1 can be written as an integral linear 
combination 


ax + y(n)y 


of a and y(n). The integers x and y are computed by the methods at our disposal: namely, 
the Euclidean or GCD algorithms. Anyone who knows y(n) can obtain x (and hence decode 
the message) — but only IT, knows this value. Even the sender of the message cannot decode 
what s/he sent, if s/he forgot to keep a copy of the message before it was encoded. With 
the above value of x, we have 


ax = 1— yy(n) 
and thus 
m* =b”=b modn. 


WORKSHEET #3 
On the construction and deconstruction of codes. 


(1) In this exercise on the RSA code, you will encode a message for transmission, decode 
a transmitted message, and then break an intercepted coded message. All coding 
in this exercise assumes that the primes p and q are chosen between 100 and 200. 
(2) We use the alphabet 


A=V2 BAU CH= 13. DIT. 2 =91,. F = 31 CS 30. 
H=33, 1=34, J=35, K =36, L=37, M =38, N =39, 
O=40, P=41, Q=43, R= 44, 

§=45, T=46, U=47, V=48, W=49, X =50, Y=77, Z=91, 
blank = 93, ,= 98, . = 99. 


Assume throughout this exercise that the transmission blocks have length 5. (This 
is suggested by the size of n.) 

(3) The base n for the encryption algorithm for the SUPERSECRET CODE is publicly 
announced to be 23711. The exponent a for the encryption algorithm is chosen as 
(and announced to be) 121. 

(4) Encode the message 

STUDY MATHEMATICS. 
(5) The secret part of the SUPERSECRET CODE is the fact that y(n) = 23400. 
(6) Decode the intercepted message 
13615199172019408040129710095113768201941297101414 
186060185917475 


10. A COLLECTION OF BEAUTIFUL RESULTS 57 


(7) Another SECRET CODE uses the base n = 12091 and exponent a = 121. The 
value of y(n) is under continuous lock and key. You will need to discover it to 
decode the intercepted message: 

01111095650956504835012310990701111089500096604835 
099070483508950063380808609907048350160908950009660483511528 

(8) What is y(n)? What is the content of the intercepted message? 

(9) What in addition to y(n) did you need to know in order to break the code? 

(10) Which code is easier to break, the SECRET CODE or the SUPERSECRET CODE? 
Why? 


10. A collection of beautiful results 


In this section we collect some beautiful consequences of the theory developed in this 
chapter, especially if the results suggest futher areas and questions for study. We limit the 
discussion to results relevant to the high school mathematics curriculum. Some of the proofs 
“formally” require material to be developed in subsequent chapters of this book. 


Primes cannot be congruent to 0 mod 4. The unique even prime is the only one congruent 
to 2 mod 4. An odd prime is hence congruent to 1 or 3 mod 4. 


THEOREM 1.93. Infinitely many primes are congruent to 3 mod 4. 


PROOF. The set of primes congruent to 3 mod 4 is certainly not empty, since it contains 
3, 7 and 11, for example. Assume that it is a finite set: {3 = pi, po, ..., pr}. It suffices to 
produce a prime p= 3 mod 4 not on this list. Let 


Q = 4po...pp + 3. 


Then obviusly Q = 3 mod 4. Consider the prime factorization of Q. Since Q is odd, the 
prime 2 does not appear in this factorization, Since 3 does not divide Q, neither does this 
prime. If only primes = 1 mod 4 appeared, then Q would also be = 1 mod 4. We conclude 
that at least one prime p # 3, p = 3 mod 4 must appear in the factoriation. Now p ¥ p; 
for 2 <j <r since such a p; does not divide Q@. We have produced a prime p = 3 mod 4 
not on our list. 


REMARK 1.94. The theorem suggests many avenues for further exploration. 


e There are infinitely many primes = 1 mod 4, but this is harder to establish than 
our last result. 

e It is a consequence of a theorem of G. Lejeune Dirichlet that there are infinitely 
many primes p of the form are an + b, n € Zso, where a and b are fixed integers 
with (a,b) = 1. 

e In 2004, Ben Green and Terrence Tao proved, the remarkable result that there are 
arbitrarily long arithmetic progressions of primes. 


The next two theorems will appear more relevant after, and serve as an inducement for, 
the study of polynomials — in the last few chapters of this book. 


THEOREM 1.95. If a € Q is a root of the monic polynomial 


go ae AS a 


58 1. THE INTEGERS 


with integer coefficients (thus n € Zyo anda; € Z for 1 <j <n), thena€ Z. 


PROOF. Since a= 7, a€ Z, b € Zyo, is a root of the above polynomial 


Gf)" 401 (G)" + anno 


it involves no loss of generality to assume that (a,b) = 1. Clearing fractions in the last 
equation we obtain 


a” baa” bs i. ab” =0 
equivalently 
a” = —b (aja""" +o... a,b") : 
Thus ja”. Since (a,b) = 1, the last divisibilty condition guarantees that b = 1 and thus 
=o 


THEOREM 1.96. If a € Z is a root of the monic polynomial 
A(«) =a" +ar™ 1+ ... Gn, 
then alan. 
PROOF. The proof is by induction on n, the degree of the monic polynomial. If n = 1, 
then the polynomial is of the form x +a and —a is its root. Consider the general case n > 1. 


If the polynomial p(x) does not have an integral root, there is nothing to prove. If it has an 
integral root a, then by the division algorithm for polynomials 


a” gyn be ee ay = (eee FS a bg se =), 


where b; € Z for j = 1, 2, ..., n—1. Now ala, = —b,_1a. If the polynomial p(x) has 
another integral root 3, then 3 must be a root of the polynomial #”~!+ ba"? + ... by_1 of 
degree n — 1. The induction hypothesis tells us that 3|b,,_; and hence also Glan. 


CHAPTER 2 


Foundations 


This chapter consists of material that should be familiar to most readers (students). It 
should be reviewed, as needed, to establish a common notation for the author and reader. 
The last section on complex number is needed only for a discussion of examples and the 
study of roots of polynomials. 


1. Naive set theory 


A set is a formally undefined object (informally, a collection of objects) containing (for- 
mally undefined) members or elements. The notation x € X (as well as X 3 x) is to denote 
that X is a set and that x is an element of X (we will also say that x belongs to X); sim- 
ilarly, c ¢ X is to denote that X is a set and that x is not an element of X. In Chapter 
1 we worked with sets of integers and we denoted by expressions enclosed by braces {....} 
collections of integers. Typical ways of describing the set of even integers between between 
2 and 20 (inclusive) are: 

{2, 4, 6, 8, 10, 12, 14, 16, 28, 20}, 
{2r;r€Zand1<r< 10} 


and 
{r € Zr is even and 1 <r < 21}. 


To avoid logical complications we work in a universal set U and assume that all the 
members of all sets under study are in U. Thus all set are subsets (defined below) of U. 


DEFINITION 2.1. We reserve the symbol @ for the emptyset; the set containing no ele- 
ments. For a given set X, X° denotes its complement consisting of the points in U that are 
not in X; 

X°={¢ €U;x4 ¢ X}. 
Given two sets X and Y, we define several relations between them and operations on them. 
We say that X equals Y (in symbols X = Y), if both sets contain exactly the same elements. 
We say that X is a subset of Y (or X is included in Y) (in symbols X C Y or Y D X) if 
every element x € X also belongs to Y. (Note that by our conventions, X C U, X = Y if 
and only if X CY and Y C X, and C X. ) The set inclusion X C Y is proper (in symbols 
X CY) if X #Y. We define the union and intersection of X and Y by 


XUY={xeEU;xreEeX orreY} 


and 
XNY={2E€U;xEX andre Y}. 


The collection of sets satisfies many basic properties similar to those satisfied by the 
integers as illustrated in the following 


59 


60 2. FOUNDATIONS 


PROPOSITION 2.2. Let X, Y and Z be three sets (all contained in the same universal set 
U). The following properties hold: 


(1) (idempotence) XN X =X andXUX =X, 

(2) (complementarity) X 1 X°=@ and X UX° =U, 

(3) (commutativity) X NY =YOX and X VY =YUX, 

(4) (associativity) X N(YNZ)=(XNY)NZ and X U(Y UZ) =(X UY) UZ, 

(5) (de Morgan’s laws) (X NY)° = X°UYS and (X UY)° = X°NY*, 

(6) (distributivity) XA(YUZ) = (XAY)U(XNZ) and XU(YNZ) = (XUY)N(XUZ), 
(7) (complementation is an involution) (X°)° = X, 

(8) (properties of 0) XNO=0 and X UN =X, 

(9) (properties of U) X NU =X and X UU =U, and 

10 


(10) (absorption properties) XN(X UY) =X and X U(X NY)=X. 


PROOF. We establish only the first equality in (5) and (7) leaving it to the reader to fill 
in sequentially the missing proofs. We start with (5). Let xe (XNY)°. Thena g XNY. 
Sox ¢ X or x ¢ Y. In the first possibility, 2 € X°; in the second x € Y°. So certainly 
x € X°UY* and thus (X NY)*° C X°UY*. Conversely, if rE X° UY, then either x € X° 
orx € Y°. So either « ¢ X orx ¢ Y. So certainly, x ¢ X MY and hence x € (XNY)°. 
Thus we have the inclusion (X N Y)° D X°UY*. The two inclusions we have established 
show that the two sets are equal. 

To show that (7) holds we note that « € (X°)° if and only if « ¢ (X°) if and only if 
LEX: 


DEFINITION 2.3. (INFORMAL) If X is a set, we denote by |X|, its cardinality, the 
number of elements it contains. Note that |X| € NU {oo}. (The symbol 00, infinity, is so 
far undefined.) See Definition 2.18 for a formal definition of cardinality of a set. 


DEFINITION 2.4. The (Cartesian) product of the two sets X and Y is defined as the set 
of ordered pairs whose first components are from X and second, from Y: 


XxY={(a,y);2 EX andyeY}. 
The difference of two sets X and Y is defined by 
X-Y=XNY*. 


EXERCISES 


(1) Prove (6) of Proposition 2.2. 

(2) If X is a set with n € Zyo elements. How many elements are there in P(X), the set 
of subsets of X. 

(3) Show that |X x Y| = |X| |Y|. Include the possibility that either or both sets are 
empty or contain infinitely many elements. What is the appropriate interpretation 
of 0 - oo in this context? 


2. Functions 


Perhaps the most important concept in mathematics is that of a function (from one set 
to another). The concept alone is not sufficient. We must also have good notation for it. 


2. FUNCTIONS 61 


DEFINITION 2.5. (INFORMAL) Let X and Y be sets. A function (map or mapping) f 
from X to Y is an assignment of an element f(x) € Y, to each element x € X. We use the 
self-explanatory notation 

f:X —~Yandf:X dar f(x) EY 
to give more information on f. The set X is the domain of f, the set Y, its target or codomain 
and 
f(X) ={y € Y;y = f(x) for some x € X} CY, 
its range or image. 
DEFINITION 2.6. The graph, Gr(f), of a function f : X — Y is defined by 
Gr(f) = {(z,y) EX x Y5y=f(z)} CX xY. 

Note that for each x € X, there is precisely one y € Y such that (x,y) € Gr(f). Thus 

for a function f : R — R, Gr(f) is a subset of R? with some additional properties (see two 


examples below). 
We are now ready for a formal definition of a function. 


DEFINITION 2.7. Let X and Y be sets. A function f : X — Y is a subset GC X x Y 
with the properties 
(1) for all x € X, there exists a y € Y such that (x,y) € G, and 
(2) whenever (x, y;) and (x, y2) € G, then y, = yp. 
For a given x € X, the unique y € Y with (x,y) € G is denoted by f(x) and called the 
image of x under f. 


Thus for functions f : J — R defined on an interval J C R, the graph of f intersects 
any vertical line at most once. The first of the next two figures illustrates the intersection 
of a graph of a function with a vertical line; while in the second, the curved figure is not the 
graph of a function. 


107 


are 


62 2. FOUNDATIONS 


10 


—6 


DEFINITION 2.8. A function f : X — Y is injective or one-to-one or an injection if 
whenever x; and #2 € X with f(x) = f(x2), then x, = x. The function is surjective or 
onto or a surjection if for every y € Y there exists at least one x € X with f(x) = y. Finally, 
f is byective or a bijection if it is both injective and surjective. 


DEFINITION 2.9. Let X and Y be sets. The identity function on X, idx, is the function 
which takes every element of X onto itself: 


idx :X DxrrxeXx. 


Whenever the set X is clear from the context, we will denote idx by id. If we choose a 
c€ Y, then the constant function on X, f., is the function which takes every element of X 
onto c: 

fo: X DErCEY. 


DEFINITION 2.10. Let f: X — Y andg: Y — Z be functions. We define the composite 

function or composition 
gof:X €xr g(f(x)) € Z. 

If f : X — X, the composition fof is also denoted by f? and f°?. Similarly’, the composition 
of f with itself n € Zso times is denoted by f” and f°”. Note that f° = idx. 

PROPOSITION 2.11. [If f:X —~ Y,g:Y — Z andh: Z — W are functions, then 
ho(gof)=(hog)of. 

ProoF. Both functions send x € X to h(g(f(x))) € W. 


'Under certain circumstances (for example, if Y = R) functions can be multiplied. In such cases f” also 
stands for the n-fold product of f. The context usually makes it clear which meaning applies. 


2. FUNCTIONS 63 


DEFINITION 2.12. Let f : X — Y bea function. A function g: Y — X is an inverse of 
j tgoJf = idx and f og = idy. 


PROPOSITION 2.13. The inverse function, when it exists, 1s unique. 
PRooF. Assume that f : X — Y has inverses g: Y — X andh: Y > X. Then 
g=goidx =go(foh) = (go f)oh=idx oh=h, 


NOTATION 2.14. The inverse of the function f : X — Y, when it exists, is denoted by 
a 


CAUTION 2.15. If f : X — R— {0} is a function, then the reciprocal function x + 


1 
f(z) 
is also denoted at times by f~!. The context usually makes it clear whether the inverse or 
reciprocal is meant. 


PROPOSITION 2.16. A function has an inverse if and only if it is a bijection. 


Proor. Let f : X — Y be a function. If f-!: Y — X is its inverse, then for 2, and 

x2 € X with f(x) = f(x2) we have 
v= f-'(f(#1)) = fF (a2) = a2. 

Thus f is injective. For y € Y, f(f~'(y)) = y. Hence f is surjective. Conversely if f is 
bijective, then for each y € Y, there exists a unique x € X such that f(x) = y. Define 
f'G) =. 

COROLLARY 2.17. If f: X — Y andg:Y — Z are byections. Then 
(i) gof:X —Z is a bijection and (go f)~1 = f-'og™! and 
(ii) f-! : Y = X is a bijection and (f-!) | = f. Also 
(iii) idx : X — X is a bijection and idx = idx. 


PrRooF. Since f and g are bijections, f~' and g~! exist. Let x, and x. € X and assume 
that (g 0 f)(x1) = (g 0 f)(x2). Applying g~! to both sides we conclude that f(21) = f(x2). 
If we now apply f~! to both sides, we see that x; = xr. Thus go f is injective. For this 
part of the proof we only need the injectivity of both f and g. The reader should rework 
the argument to use only this information. If z € Z, then since g is surjective, there exists 
ay € Y such that g(y) = z. The surjectivity of f implies the existence of a x € X with 
f(x) = y. Thus g(f(x)) = z and we conclude that go f is surjective. Now 


(gof)o(f-'og™) =goidy og"! =gog™ =idz. 
Similarly, 
oe ° go) o (gos) idx. 
This establishes (i). The proofs of (ii) and (iii) are left to the reader. 


DEFINITION 2.18. We say that two sets X and Y have the same cardinality if there exists a 
bijection between them (in symbols |X| = |Y |). Note that we have not (yet formally) defined 
the cardinality |X| of a set X.? We now define |0| = 0 and for all n € Zyo, |{1,2,...,n}| =n. 
We say that a set X has finite cardinality (|X| < oo) or X is a finite set, if it is either the 
empty set (|X| = 0 in this case ) or in bijective correspondence with the set consisting of the 


?Only what it means for two sets to have the same cardinality. 


64 2. FOUNDATIONS 


first n positive integers for some n € Zyo (|X| = n). Otherwise, we say that X has infinite 
cardinality and write |X| = oo. Thus |X| denotes the number of elements in X. We say that 
an infinite set is countable or has countable cardinality if it is in bijective correspondence 
with the set of positive integers. 


PROPOSITION 2.19. Suppose X and Y are finite sets, then 
IX UY|+|XNY|=|X|+4+ YI. 


PrRooF. If either X or Y = 0, we may assume that Y = 0. In this case X UY = X 
and X 1 Y = Q@ and the equality of proposition is trivially true. So assume that neither 
X nor Y = @ and that X NY = @. Assume that |X| = n and |Y| = m. Thus there 
exist bijections f : X — {1,2,...,.n} and g: Y — {1,2,...,m}. We set up a bijection 
h:XUY — {1,2,...,.1+ mb}, by defining 

f(x) forxz Ee X 
Oe { g(x) +nforxeY ~ 


We use the fact that X and Y are disjoint to conclude that h is well defined (makes sense). 
The map h is clearly both injective and surjective. For the general case we note that that 
the sets X and Y — (X NY) are disjoint and that 


XUY=XU[Y -(XNY). 
Thus 
IX UY|=|X U[Y —-(XNY)]| = |X| 4+ |[Y¥ -@eny)]| =|X]4+ |Y|-|Xn YI. 


REMARK 2.20. (1) The sets Z and Q are countable; so are the sets 2Z and Qs; the 
sets R and C are not. 
(2) A countable union of countable sets is countable. 


3. Relations 


Relations, our next concept, are generalizations of functions. They play a key role in 
algebra and almost all branches of mathematics. 


DEFINITION 2.21. Let X and Y be sets. A relation R from X to Y is a subset of the 
Cartesian product X x Y (RC X x Y). It is convenient to write Ry for (x,y) € R. A 
relation from X to X is also called a relation on X (these are the most common types). 


REMARK 2.22. If f : X — Y is a function, then Gr(f) is a relation from X to Y. 
DEFINITION 2.23. Let R be a relation on a set X. We say that R is 


e reflexive if Rx for alla € X, 

e symmetric if for all and y € X, xRy implies yRx (equivalently, if for all x and 
y € X, cRy if and only if yRa), 

weakly antisymmetric if for all x and y © X, «Ry and yRzx implies that x = y, 
antisymmetric if for all x and y € X, «Ry implies that (y,x) ¢ R, and 

transitive if for all x, y and z € X, xRy and yRz implies that «Rz. 


3. RELATIONS 65 


EXAMPLE 2.24. We examine certain relations on Z. 


e Equality (=) is reflexive, symmetric, weakly antisymmetric, not antisymmetric, and 
transitive. 

e Greater than or equal (>) is reflexive, not symmetric, weakly antisymmetric, not 
antisymmetric, and transitive. 

e Greater than (>) is reflexive, not symmetric, weakly antisymmetric, antisymmetric, 
and transitive. 

e Congruence of integers modulo n € Zso (= mod 7) is reflexive, symmetric, not 
weakly antisymmetric, not antisymmetric, and transitive. 

e Let f: Z— Z bea function. We define 


R= {(a, f(x); x € Z}. 
Then R is not reflexive, not symmetric, not weakly antisymmetric, not antisymmet- 
ric, and not transitive. Nether is 

R= {(f(z), 2); x € Z}. 

DEFINITION 2.25. A graph is a collection of points (called vertices) and lines (called edges 
joining some pairs of points. A directed graph or a digraph is a graph where each edge has a 
direction (an arrow from its originating vertex to its terminating vertex). We note that two 
vertices may be joined by more than one edge. 


A useful way to represent pictorially a relation R on a set X is by its digraph ['(R) 
constructed as follows. The vertices of (2) are the points x € X. A directed edge starts at 
x and ends at y if and only if eRy. 


EXAMPLE 2.26. Consider the set X4 = {1,2,3,4}. We let R be the relation >. Thus 
k= {(1, ine (2, 1), (2, 2), (3, 1), (3, 2), (3, 3), (4, 1), (4, 2), (4, 3), (4, 4)}. 
Its directed graph is 


Another useful way to represent a relation R on a set X is by an adjacency matrix M(R) 
constructed as follows. We index the rows and columns of M(R) by the points x € X. 
Each entry in M(R) is either a zero or a one. We define the entry corresponding to the row 
indexed by x and the column indexed by y to be 1 if Ry and to be 0 if (z,y) ¢ R. 


66 2. FOUNDATIONS 


EXAMPLE 2.27. For the relation R of the previous example 


a 2 Bet 
T/1 000 
M(R)= 2/1 1 0 0 
34): tO 
Ota oi 


Note that in general the adjacency matrix of a relation can be infinite. For example, for the 
relation = on Zso, the adjacency matrix is infinite symmetric with ones along the diagonal 
and zeros elsewhere: 


0123 4 
0;1 0 0 0 0 
1) 0. LO. 00 
ZO" Odi Be 0 
3/0 0 0 1 0 
4/0 000 1 


Among the most interesting relations are those that satisfy a number of the properties 
of Definition 2.23. We give some of these special names. 


DEFINITION 2.28. A relation R on a set X is 


e a partial order if it is reflexive, weekly antisymmetric and transitive (we also say 
that X is partially ordered by R and that X is a partially ordered set), 

e a strict partial order if it is antisymmetric and transitive, and 

e an equivalence relation if it is reflexive, symmetric and transitive. 


EXAMPLE 2.29. The relations <, < and = on Z are a partial order, a strict partial order 
and an equivalence relation, respectively. 


DEFINITION 2.30. Let R be a strict partial order on a set X. Let x and y € X. We say 
that y is an immediate successor of x (and x is an immediate predecessor of y) if Ry, and 
if for some z € X, eRz and zRy, then z = y. If R is a partial order (perhaps not strict), we 
modify the above definition by requiring that y 4 x. 


The concept of successor can be illustrated by a graph. Let R be a partial order on a 
set X. The Hasse graph G of R is a graph whose vertices are the points of X and if x and 
y € X with y is an immediate successor of x, then G has a directed edge from x to y. 


EXAMPLE 2.31. Fix an integer n > 1. Congruence modulo n is an equivalence relation 
on Z. 


DEFINITION 2.32. Let X and I be nonempty sets. A partition of X (indexed by I) is a 
collection of subsets {X;;7 € I} of X such that 


xi =X. 


wel 


and 


4. ORDER RELATIONS ON Z AND Q 67 


We call the X;, the blocks of the partition. 
Partitions and equivalence relations are essentially the same thing as shown by 


THEOREM 2.33. Let X be a nonempty set. If {X;;i € I} is a partition of X, we define 
a relation R on X by xRy for x andy € X if and only if x and y € X; for somei€ I. This 
relation is an equivalence. 


PROOF. It is of course obvious that R defines a relation on X. Let x € X. Since rRa, 
R is reflexive. If x and y € X and «Ry, then x and y are in the same block of the partition. 
So yRx and R is symmetric. Now let us take x7, y and z € X with xRy and yRz. Thus x 
and y are in the same block and y and z are in the same block. We conclude that x and z 
are in the same block and thus 7Rz; that is, R is transitive. 


We outline the proof of the converse to the above theorem. Let EF be an equivalence 
relation on X. For each x € X, we form the set 


E, = {y € X;xEy}. 
The collection of subsets of X, 
V={E,;2 € X} 


contains many equal elements. We remove from V all but one copy of every collection of 
equal elements in this set. The remaining sets in V are the blocks of a partition of XY. An 
x € X belongs to E, and since E, € V, the union of the sets in V is all of X. Let x and 
z © X and assume that FE, E, contains an element y € X. Thus xEy and yEz and hence 
also «Ez. But this means that z € E,. Next, if w € Ez, then wEz. Thus also wEx and 
w € E,. We have shown that FE, C E,. By symmetry, also E, C EF, and thus E, = E,. So 
the blocks are disjoint. 


4. Order relations on Z and Q 


4.1. Orders on Z. Using elementary set theory one constructs the natural numbers 
N. From N one proceeds to the construcion of the integers Z (as the disjoint union of N 
and N — {0}) and its binary operations of addition + and multiplication -; resulting in the 
commutative ring (Z,+,-). In this section, as an illustration, we describe one relation on Z. 

The most basic relation on Z is that of equality (=). It is an equivalence relation and it 
partitions Z into subsets consisting of single elements. 

We turn to a study of a second most important relation on Z. 


DEFINITION 2.34. Let a and b € Z. We say that b is greater than or equal to a (in 
symbols b > a) if and only if b—a EN. 


As noted earlier > is reflexive (a > a for alla € Z since a— a = 0 € N), weekly 
antisymetric (if a > 6 and b > a, then a — b and b —a € N which implies that a — b = 0) 
and transitive (if a > b and b > c, then a— b and b—c € N which tells us that also that 
a—céEN). 

An integer is positive if it belongs to N — {0} and negative if it does not belong to N. 
The set of positive integers is closed under addition and multiplication; the set of negative 
integer under addition, but not multiplication. If a and b € Z and c is a positve integer, 
then the cancelation property 

b> aiff be > ac 


68 2. FOUNDATIONS 
holds. All the above properties (propositions) require, of course, formal proofs. 


4.2. Orders on Q. Recall that a rational ¢ (here a € Z and b € Zyo) is an equivalence 
class of pairs of integers. 


DEFINITION 2.35. Let ¢ and § be rational numbers. We say that § is greater than or 


equal to * (in symbols § > ¢) if and only if cb > ad. 


A first task is to show that the concept of > is well defined on Q. So assume that t= a 
and $ = g. We have to show here that cb > ad if and only if c'b! > a’d’. The definition of 
rational numbers tells us that ab! = ba’ and cd’ = dc’. Using the last two equalities and the 


cancelation law, we conclude that 
cb > ad iff cbb’! > adb’ = ba'd 
iff cb! > a’d iff dc'b! = cb'd' > a'dd' iff db’ > ad. 
5. The complex numbers 


Mostly as a source for examples, we study® the complex numbers (C,+,-) under the 
operations of addition and multiplication. This number system shares many, but not all, of 
the properties of the real numbers (R,+,-). Missing is the canonical ordering (in general one 
studies (R,+,-,>) rather than just (IR,+,-)). The complex numbers satisfy all the rules of 
addition and multiplication satisfied by the real numbers (in the language discussed in §1 of 
Chapter 5, they form a field) To construct C we start withth R and introduce a new symbol 
2 that satisfies 


2 
ee 

We can view C as consisting of numbers of the form c = a+, with a and b € R. Addition 

of such numbers is vector (component) sum; thus 


(ay + by2) + (a + by2) = (a1 + G2) + (b1 + b2)2. 


It agrees with vector addition in R? if we identify the complex numbers C with the Cartesian 
plane R?. In this identification we use 1 and 2 as a basis for C over R that corresponds to 
the usual basis (1,0) and (0,1) for R?. Multiplication seems a bit more complicated: 


(ay + bit) (ay + bot) = (a1a9 ame byb2) + (a,b, + ab, )t. 


The multiplicative inverse or reciprocal of non-zero complex number is again a complex 
number; to describe it, it is convenient to first introduce the conjugate ¢ of the complex 
number a+ bi as 

c=a—h 


and the absolute value or modulus |c| of c as 
lc] = Va? +b? = Vee. 


With these preliminaries out of the way, the reciprocal of the complex number 0 4 c = a+ ba 


is easily seen to be 
if 1 a-h a-h Cc 


ath atha—-h a+? |c|2" 


3Presumably, a review for most readers. 


5. THE COMPLEX NUMBERS 69 


aiy+b12 


acre (we must, of course, assume that a2+bo2 4 


Using the last formula it is easy to compute 
0). 

Let us identify the complex number c = a+ bz € C (remember that here both a and 
b € R) with the point in the Cartesian plane (a,b) € R?. Thus we think of c as a directed 
line segment from the origin in R? to the point (a,b) € R?; an arrow (direction). For graphic 
representations, there is nothing magic about starting at 0. The same vector is obtained 
by moving the arrow (while preserving its length and direction) to start at any point in the 
Cartesian plane. The graphic interpretation of complex addition (addition of vectors) is now 
easily illustrated (see Figure 1). 


FIGURE 1. Addition of complex numbers. 


To add the points z and w € C, we represent them as directed line segments starting 
at the origin in R?. We then move the arrow corresponding to w to start at the end point 
of z. The arrow from the origin to the end point of the transported w now represents the 
sum z+ w. We can also transport z to start at the end point of w. We thus form a closed 
parallelogram; its main diagonal (the one starting at 0) represents z + w; its other diagonal 
(from w to z) transported to 0 represents z — w. We have used rectangular coordinates on 
R? for a geometric interpretation of complex addition. 

Polar coordinates are useful to obtain a geometric interpretation of complex multiplica- 
tion. If we represent, the non-zero complex number c = a+ bz as the point (a,b) € R’, 
then we can associate with it two other real numbers r = Va? + b? = |c| (note that r, the 
absolute value of the non-zero complex number c is positive) and @ = arcsin 2 = arccos £. 
The two equations defining @ specify it uniquely up to an ambiguity of the form 27n with 
n € Z. (Note that either single equation would involve a “bigger” ambiguity.) We call 0, 
the argument of the complex number c. For 6 € R, it is convenient to denote the complex 
number of absolute value 1, cos@ +2sin@, by the symbol e”. With this convention, the 
complex number c is represented in polar coordinates as 


c=re . 


Note that e’®° = 1 and that we may view the number c = re” as the product of 1 


and c; this product is obtained by multiplying their moduli and adding their arguments. 


70 2. FOUNDATIONS 


In general multiplication of a vector z by the vector c moves the vector z in the counter- 
clockwise direction through an angle @ and adjusts the length of the resulting vector. Thus, 
geometrically, the vector (in R*) corresponding to the product of the non-zero complex 
numbers c, = rye’! and cy = ree’ is a vector of length rjrg with argument a; + a2. 


FIGURE 2. Multiplication of complex numbers. 


From the geometric interpretation of multiplication we see that for all 6 and y € R, 
cAew = etGty) | 


We now transform the last equation to rectangular coordinates: 


(cos @ + zsin 8)(cos y + 2sin y) = cos(# + vy) + zsin(6 + y). 
Equating the respective real and imaginary parts of the complex numbers involved, we obtain 
the angle addition formulae 
cos 8 cos y — sin @ sin y = cos(6 + y) 
and 
cos @sin yp + sin cos yp = sin(@ + y). 
Many other identities can be similarly derived. 


REMARK 2.36. The complex numbers are complete in the sense of analysis (every Cauchy 
sequence converges), and as we shall see later, in the algebraic sense (every polynomial over 
C has a root). 


CHAPTER 3 


Groups 


In this chapter we introduce, mostly through examples, the most basic algebraic struc- 
tures; that of a group. We have already encountered several families of groups: 


(1) (Z, +), (Q, +), (R, +) and (C, 4). 

(2) ({41}, -), (WV, -), (R*, -) and (C*, -), where R* denotes the set of elements in 
R that are invertible with respect the mutiplication (usually, but not always, the 
non-zero elements in R). 

(3) (Zn, 3 ne Zs0. 

(4) (Z*, -), 2 € Zyo. 


The first two sections of the chapter are devoted to the study of one new family of groups, 
the permutation groups. In a sense to be made precise later (in Chapter 4, Section 6.1), all of 
group theory consists of a study of this family. A main difference between permutation groups 
and those previously considered groups is that, in general, the product of two permutations 
do not commute. In the third and final section of the chapter, we formally define the concept 
of a group and study more examples. 


1. Permutation groups 


This section is devoted to the study of the most basic operations on sets (mostly finite 
sets) and their self-maps that lead us very naturally to the concept of a group. 


DEFINITION 3.1. Let X be a nonempty set. A permutation of X is a bijection from X 
to itself. We will denote the set of permutations of X by the symbol Perm(X). 

The case of finite X is of most interest. In this case it is convenient to use for X the set 
X,, consisting of the first n positive integers (we assume throughout that n > 2): 


Ke SS {2 aun 


In this case”, we use the symbol $(n) for Perm(X,,) equipped with the operation of composi- 
tion of functions (which we regard as a multiplication on S(n)) and call S(n), the symmetric 
group® on n symbols (letters or elements). 


An element 7 € S(n) sends the integer 7 € X, to the integer 7(j) € X,. A good way to 
represent such a permutation is by a matrix consisting of two rows. The first row lists the 


‘Because the case n = 1 is completely trivial. 

?Also for the case n = 1. Of course, $(1) consists of only one element. 

3We will subsequently define the concept of (abstract) group. Of course, these will be prime examples 
of the concept. 


71 


72 3. GROUPS 


integers in X,: 1,2,...,n, in any convenient order, and in the second row, the entry under j 
is 7(j): 


THEOREM 3.2. Fix a positive integer n. 


Ifa anda € S(n), then so is their composite toa which we denote as m0. 
The identity self map of X,, denoted by id or idx, is an element of S(n). 
If x € S(n), then so does m7. 

|[S(n)| = n! 


PROOF. Only the last statement needs to be verified. To construct a permutation 7 € 
S(n), we may send the integer 1 to any of n integers, the integer 2 to any of the remaining 
n — 1 integers, etc ... O 


EXAMPLE 3.3. We illustrate most of the concepts using examples for n = 10. Let 


die ie Oe PBS FB A. 1s 10 
TN Ba ae Beige EO TG he 478 


and 


Then 


OB a oe BH BO: 16 
OS Ne BO Fe BB “Sh B a gp ip 


Remember that we are composing permutations as functions, thus for two permutations 
o and 7, (o7)(j) = o(m(j)). We obtain a convenient way to multiply permutations, by 
realizing that reordering the columns of a given representation of a permutation does not 
change the permutation. Thus we may use the order of the second row of 7 to determine 
the first row of o to obtain 


Jaf PS AT BO: 40 
8. 4G: OG 10 a: Bp 
ye (345672 91018)’ 
42678310 519 


the representation of om is now easily read-off; it consists of the first and fourth lines of 
the last array (note in the above “algorithm,” we write down first the permutation for the 
rightmost map (the one we do first). Using the reordering idea, we obtain the representation 
of x~' from the one for 7 by interchanging its two rows. Note that 


Ly BABE FH Bi-9. 10 4 
BOSSE NOG. Be Ae OGY TG: TBs | °F ons 


The last example showed that the multiplication on S(n) is not commutative. Note that 
in order to show that two elements o and a € S(n) do not commute, it is not necessary to 
compute mo and oz. All we need to do is to find one j € X,, for which mo(j) 4 on(j). In 
our example, there are many such j; in particular, 3 = mo(1) # o7(1) = 4. 

We note that our way of representing permutations is still rather cumbersome. A more 
detailed study of S'(n) will also suggest better ways to represent elements of this group. 


1. PERMUTATION GROUPS 73 


DEFINITION 3.4. Let us introduce the convention that whenever the integer n + 1 ap- 
pears, it is replaced* by 1. A permutation t € S(n) is cyclic if there is a rearrange- 
ment £1, 9,..., Lr, Lr41,---;£n Of the integers 1, 2,...,n such that 7 fires r,41,...,2n (that is, 
Thos =e LORS 7 yas 9) and Cycles ti; Poy nee Ahetis ne) Stay lord =e ed 
and m(z,) = 21). The integer r is called the length of the cycle, in symbols I(7), and 7 is 
called an r-cycle. A 2-cycle is called a transposition. 


NOTATION 3.5. The cycle defined above is conveniently represented by 
= (Gti Be): 
Note that the fixed points ,41,...,%n of the cycle do not appear at all in its new symbol. 
Whenever appropriate, we will use the symbol (-) to denote the identity cycle. Since cycles 
are special cases of permutations, we can multiply them. Note that in the product 
(ipa, 25, We) (Ys Yass, Us) 
we first perform the second permuatation; thus 
Ls S20 a A Be SO SO LO 
aan) ( 346512789 10 ) : 


We read products of cycles from right to left, but each cycle from left to right. 


DEFINITION 3.6. Let a € S(n). We say that m moves j (7 € Z, 1 <j <n) if t(j) #7. 
Let 7 and o € S(n). We say that a and o are disjoint if every integer moved by 7 is fixed 
by o and every integer moved by a is fixed by a. 


LEMMA 3.7. Let m € S(n). If7 € Z, 1 <j <n, is moved by a, then so are m(j) and 
ig) 

ProoF. If (7) is not moved by z, then (a(j)) = 7(j) and applying 77! to both sides 
of this equation, we get the contradiction that 7(j) = j. If w~!(j) is not moved by z, then 
j =m(m '(j)) = 7 '(j) and applying 7 to the extreme terms of this equation, we get once 
again the contradiction that 7(j) = j. 


THEOREM 3.8. If 7 anda € S(n) are disjoint, then they commute. 


PrRooF. Let j € Z, 1 <j <n. There are three possibilities: 


e Either 7 is moved by a and hence j and z(j) are fixed by o (By definition 7 is 
fixed by a). If j is moved by 7a, then so is 7(j) by the last lemma and hence the 
disjointness of 7 and o gurantees that 7(j) is fixed by o.). 

e Or j is moved by o and hence j and o(j) are fixed by a. 

e Or 7 is fixed by both 7 and o. 


In the first case 
in the second, 


and in the third 
m(o(j)) =j = o(n(9))- 


4We are thus using arithmetic modulo n; however we modified the standard representation of equivalence 
classes in one case only. The equivalence class [0], is represented by n instead of 0. 


74 3. GROUPS 


Thus in all cases, m(o(j)) = o(a(J)). 


EXAMPLE 3.9. Non-disjoint cycles need not commute. This already happens for n = 3 
since (1, 2)(1,3) = (1,3, 2) while (1,3)(1, 2) = (1, 2,3). We can, of course, view this example 
as taking place in S(10). 


EXAMPLE 3.10. Consider our Example 3.3. To represent 7 by disjoint cycles, we start 
with 7 = 1 and follow it around under the action of 7. Note that 


n(1) = 3, °(1) = 2(3) =5, 2°(1) = 2 (5) = 7, 2°(1) = 2(7) =9 and a°(1) = x(9) = 1. 


Thus this part of 7 is represented by the cycle (1,3,5,7,9). We note that 2 does not appear 
in this cycle. So we now start with 7 = 2 and follow it around under the action of 7: 


m(2) = 4, 1°(2) = 1(4) =6 and 1°(2) = 1(6) = 2. 


Thus this part of 7 is represented by the cycle (2,4,6) and the first two parts of the trans- 
formation are represented by the product (1,3,5,7,9)(2,4,6) (we could have reversed the 
order). Note that 8 and 10 do not appear in the last product. Continuing the process one 
more step, we see that 
=, 3.5, 7,9) (2,46) (8.10): 

In the above steps we tacitly assumed that 7 € S(10). The same representation holds for 
ma € S(n) with n > 10 provided we view the permutation 7 as fixing each integer 7 with 
11 <j <n. In decomposing 7 into a product of disjoint cycles, the order does not matter. 
Thus also 

n= (1,3, 5, 7, 9)(8, 10)(2, 4, 6) = (8, 10)(2,4, 6) (1,3, 5, 7,9), 
are among the 6 possible ways of writing 7 as a product of disjoint cycles. 


It is not at all surprising that the above construction is quite general. We indeed have 


THEOREM 3.11. Every 7 € S(n) can be written as a product, perhaps the empty product, 
of disjoint cycles. This decomposition into cycles is unique up to order. 


PROOF. If z is the identity, it is represented by the empty product. Otherwise 7 does 
not fix every integer. Ignore the integers fixed by 7; they do not contribute to any nontrivial 
cycle. More precisely we remove these integers from the first and second row of the matrix 
representation of the permutation 7. We now have a permutation 7, of a subset of the 
integers 1, 2, ..., n. Start (what we call the process) with the smallest integer k, in the 
domain of this transformation, it is the smallest integer not fixed by 7, and follow k; around 
through z or 7, to obtain a set of integers k,, ko, ..., such that 7(k;) = kj41,. We stop this 
process as soon as we get a repetition in the set k,, ko, ..., k-41. We must get a repetition 
since for all 7, 1 < kj <n. Note that r > 1. We claim that k,4, = ky. If kp4, = k, with 
1<s<vr, then 

W( Kei) = ks = Ry = T(r), 
and applying 7~!' to extreme sides of the last displayed equation we get k,_, = k,; contra- 
dicting the minimality of r+ 1. Hence we have 


1(ky) = ko, 1(k2) = kz, sds 1(Kp—1) = ie 1(k,) = ky 
for some integer r, 2 <r <n, where the collection ky, ko,...,k, consists of distinct integers. 


Thus this part of the permutation 7 is represented by the cycle (ky, ko,...,k,). Add this new 
cycle constructed (either on the left or right) to the ones previously constructed. Remove the 


1. PERMUTATION GROUPS 75 


integers in this cycle from the matrix representation of 7,. If we have obtained the empty 
set, we are done. Otherwise, call this new permutation 79 and repeat the process on it. It is 
clear that we will eventually stop. This yields the desired decomposition; the uniqueness of 
the decomposition up to order is obvious from the construction. 


We illustrate with an example. 
EXAMPLE 3.12. We simplify the product 
a = (1,4)5,6) (4, 7,3) (2,5, 4) (2,3): 


The integers 8, 9 and 10 do not appear in the above product; they are fixed by 7. It is 
easiest if we work with an alternate representation of the permutation. We readily compute 


ee ee ey eee ee ae 
BOSON AE 6 Bebo Me Be) 
from which it follows rather quickly that 
T= O24) 738, 0): 
As a result of the last theorem, we can easily construct the multiplication table for S(n), 
as long as n is not too large. It is an n x n matrix (ignoring headers) where we index the 
rows by x € S(n) and the columns by y € S(n). The (zx, y) entry in the matrix is then the 


product of the permutations x and y (in this order; that is, zy). For small n, the calculation 
can be done by hand. We illustrate with the 


MULTIPLICATION TABLE FOR S(3). 


(-) (1, 2) (1, 3) (2238) Gl 23) (1352) 

i (-) (1, 2) (1, 3) (2, 3) (1, 2,3) (1, 3,2) 
CV GI OfG32/02,3)| G3) G3) 
(1,3) | (1,3) | @,2,3) (sea) FL, 2) (258). |h 
23) Cad seen). Ol Gs 2) 
(1,2;3) | ,2,3)) (3) @,3)| G2) 1G,3,2) () 
C3103 2-2 Gal Ola) 


It is quite tedious to produce by hand the multiplication table for Sm) even with rel- 
atively small n. For example, for n = 5, the multiplication table is a 120 x 120 matrix. 
Computers can, once again, help. We illustrate with a program that computes the multipli- 
cation table for S(3) as a check on our work and then computes the multiplication table for 
a subset of S(n), n > 4 consisting of 8 permutations. 


MAPLE SESSION #9 
> with(group): 
> £(1) Che £02): ve Thole: fy +S LE sile <A) <=" (2.31): 
f (5) PEE 2a ste) = [Et 324s 
> @ t= array (1>,6,1..6): 


> for ito 6 do for j to 6 do ali,j] := mulperms(f(j),f(i)) end do end 
do; 


76 3. GROUPS 


e< “prantta)s 


| (1,2]) [3] 2. 3)) 1,2, 3)) [L, 3, 2) 
[[1, 2] (,3,2]} [1,23] [2,3] [3] 
1.3]} (2.3) 82TH (2, 3] 
[2,3] [11,3,2]) [1.2.3] 1. 3]} {1,2 
12,3] [H3]) 2,3]} [2] (3.2) 
1.3,2]} [2.3] [2 TEL SI] I (2, 3]] 
> gg) := C[t]]: g(2) := [[1,2,3,4]]: g(3) := mulperms(g(2), g(2)): 


g(4) := mulperms(g(2), g(3)): g(5) :=[[3,4],[1,2]]: g(6) 
mulperms(g(2), g(5)): g(7) := mulperms(g(3), g(5)): g(8) 
mulperms(g(4), g(5)): 


b := array(1..8,1..8): 


for i to 8 do for j to 8 do bli,j] := mulperms(g(j),g(i)) end do end 
do; 


s print (by: 


| (1,2,3,4]] [1,3], [2,4] [14.3.2] [[1, 2], [3,4 ([2, 4] (1, 4], [2, 3] ([1, 3] 
((1,2,3,4]] [1,3], [2,4] [[1,4,3, 2] [ [[1, 3] (1, 2], [3, 4] ([2, 4] (1, 4], [2, 3] 
[1,3], [2,4] [[1, 4,3, 2] (1,2,3,4]] — {[1,4], [2,3 ([1, 3] (1, 2], [3, 4] ([2, 4] 
[[1, 4, 3, 2]] [ (1, 2, 3, 4] 1,3], [2,4] ([2, 4] ([1, 4], [2,3] (1, 3] (1, 2], [3, 4] 
(1, 2], [3, 4] ([2, 4] (1, 4], [2, 3] (f1, 3] [ (1,2,3,4]]  [[1,3],[2,4]) [ft 4,3, 2] 
([2, 4] (1, 4], [2, 3] (1, 3] 1,2],[3,4]]  [[1,4,3, 2] [ (1,2,3,4]] [1,3], [2,4] 
(1, 4], (2, 3] ([1, 3] ((1, 2], [3, 4] ([2, 4] ((1,3],(2,4]]  [[1, 4,3, 2]] [[1, 2, 3, 4] 
(1, 3] ([1, 2], [3, 4] ([2, 4] 1,4), (2, 3] [[1,2,3,4]) [[1,3], 2,41] [f1,4,3, 2] [ 
***END OF PROGRAM*** 


(1) MAPLE denotes the identity permutation by [] and the cycle (a, b,c) by [[a,b,c]]. 

(2) The MAPLE command multperms (a,b) for the product of the permutations a and 
b computes the product ba; that is, MAPLE reads products from left to right — not 
the way we have been doing. 


EXERCISES 
(1) In the proof of Theorem 3.8 the case “j is moved by both a and o” does not occur. 
Explain why. 
(2) What are necessarry and sufficient cOnditions for two distinct transpositions to com- 
mute? 
(3) Let n be an integer > 2. 
e Let o be a permutation in S(n). Show that 
o(1,2)07! = (o(1),0(2)). 
e Let 1<k<7n and let 
PHD on): 
Show that 
T*(1,2)7-* =(k+1,k +2). 
How should you interpret n+ 1 and/or n+2 if they appear in the last equation? 


2. THE ORDER AND SIGN OF A PERMUTATION 77 


e Let 1<a<b<7n. Show that 
(a,b) = (a+ 1, a)(a,a —1)...(b— 2,6 — 3)(b— 1,6 — 2)(b—-1,5)...(a+.1,a4+ 2)(a,a+1). 


e Conclude that that any o € S(n) can be written as product of powers of 7 and 
(1,2). 


2. The order and sign of a permutation 
For this section, we fix once and for all a positive integer n. 


DEFINITION 3.13. Let 7 € S(n). To define the er m® of m, we set 1° = id and 
a! = 7. For k € Zso, we define inductively 7* = mm*-!, We also define m-* = (1~!)*. Note 
that in the left hand side of the last equality, 7 = ta tees the minus one power of 7; while 
in right hand side, it represents the inverse of 7. The same symbol is used for these two 
objects because they define the same permutation. 


PROPOSITION 3.14. Let 7 ando € S(n) andr ands € Z. Then 


3) 9 — ne 
4) vA and ao commute, then mo" = o'r, and 
5) ifm anda commute, then (10)" = 1" 0". 


PROOF. We prove only the first and last two assertions, leaving the proofs of the other 
two to the reader. To establish the first claim we fix s. We now prove the assertion for r > 0 
by induction. The base case r = 0 is trivial. We assume the formula for r > 0 and prove it 
for r+1. Now, 


q’tigs — = aa" rs = =r" r+s = qi tts. 


the first equality uses the definition of the r + 1 power of 7; the second, the induction step; 
and the third, the definition once again. We have established (1) for all s € Z and all 
r € Zs. So, by symmetry, we know that (1) holds if either r or s is non-negative. The 
reader should at this pont establish (3) which is needed for continuing with the proof of (1). 
If both r and s are negative, then 


rr = (x) (x) = (ae ee 1 _ Ge) a qetr 
This finishes the proof of (1). We show that (4) holds for r > 0 by induction on r. The base 


case, r = 0, is a tautology. Assume (4) for r > 0. Then 


to"t! = no"a = o' 10 = "on = "tt 


To prove (4) for negative r, we first obseve that if to = oz, then pre-multiplying and post- 
multiplying both sides by o~', we get oa = ra~!; that is, if t and o commute so do 7 and 

' (thus also 7~! and oa, as will be needed in the next displayed set of equations). Hence 
for negative r, 


te =e) Sen) So le Sor 
We establish (5) for non-negative r by induction. The base case. r = 0, is again obviously 
true. So assume that the formula holds for r > 0. Then 


(xno)"t! = (no)"10 = Toon = To ge = nat = Att, 


78 3. GROUPS 


If r is negative, then 


(no)" = ((no)"")* = (a o ) =0'n Sore 


PROPOSITION 3.15. Let 7 € S(n). There exists anim € Zs, such that n™ = id. 


PROOF. The group S(n) has n! elements. The successive powers 7,7”, 7°, ..., all belong 
to S(n). Hence there must exist positive integers r < s such that 7” = 7°. Multiplying both 
sides by 7—” shows that id = 7°". 


DEFINITION 3.16. The order of the permutation 7 € S(n) (in symbols, o(7)) is the 
smallest positive integer m such that a” = id. 


EXAMPLE 3.17. We record several elementary facts about orders of permutations. 
e The order of the identity is 1 and this is the only permutation of order 1. 
e The order of every transposition is 2. 
e The successive powers of the cycle (1,3,5,7,9) € S(10) are 
(15.355, 05-9)y (1, 539,3, 7) Cy ty 3, O85) (1973.3) aind. (2): 
Thus the cycle (1,3, 5, 7,9) has order 5. 


REMARK 3.18. The properties of the order function on S(n) should be compared to the 
(multiplicative) order function on Z*.. 


THEOREM 3.19. Let r ands € Z. If m € S(n) has order m, then x" = n° if and only if 
r=s mod™m. 


ProoF. Assume without loss of generality that r > s. Now a” = a°* if and only if 
m’~* = id. Thus we establish the theorem by showing that for g € Z, 7? = id if and only 
if ¢ =0 mod™m. If gq = km for some k € Z, then 74 = (1™)* = id. Conversely, assume 
that 7? = id. Using the division algorithm, we write q = km+ p, with k and p € Z and 
0<p<m. Thus 7? = 77-™ = 79(n™)—* = id(id)—* = id, and thus by the minimality of 
i p= DO. 


PROPOSITION 3.20. The order o(7) of a cycle 7 € S(n) is its length. 


PROOF. Let 7 = (1, d2,...,@m) be a cycle of length m. This cycle moves an a; to aj41, 
provided that for subscripts we interpret all operations modulo m. Thus a” moves a; to aj+, 
and r = m is the first power of 7 that fixes each a;. 


PROPOSITION 3.21. Let a and o be disjoint cycles in S(n). Then 
(10) o(ma) = lem(o(7), o(0)). 


ProoF. Let r = o(7), s = o(o) and d = lem(r,s). Then d = ra = sb, for some a and 
b € Zo. Thus because 7 and o commute, (to)4 = r"¢o* = id. It follows that o(ma)|d. 
Suppose that (70)° = 7°o° = id for some e € Zyy (thus elo(mo)). Choose an integer k, 
1<k<n. If k is moved by 7, then it is fixed by o (hence also by o°). Thus 


k= id(kh) =a" (ok) ) =a" (kh): 
Thus r = o(m)le. Similarly, s = o(c)le. We conclude that d = lem(r,s)le. In particular, 
dlo(zo) and we conclude that d = o(70). 


2. THE ORDER AND SIGN OF A PERMUTATION 79 


In the last proof we never used the fact that 7 and o were cycles; only that they were 
disjoint permutations. Hence we also have 


COROLLARY 3.22 (of proof). If 7 and o are disjoint permutations in S(n), then (10) 
holds. 


EXAMPLE 3.23. We have seen that the permutation 


HPL RAS! ei 10 
TS NB AP Br oO: Oe AOD I 8 


is decomposed as (1, 3,5, 7,9)(2,4,6)(8, 10). Thus (formally the conclusion follows only after 
we have established the next theorem) 


or) = lem (G, 3, 2).= 30: 


THEOREM 3.24. Let 7 = 1 7...T, be the decomposition of 7 € S(n) as a product of 
disjoint cycles. Then 


o(m) = lem(o(7), 0(72)..-, 0(7%)). 


PROOF. We use induction on k. The base case k = 1 is of course trivial. Assume 
that k > 1 and that we have the formula for permutations 7 which are products of k — 1 
disjoint cycles. If 7 = (7179...Tr-1)T, where the k cycles are disjoint, then the permutations 
T{T2...Th-1 and Tx are also disjoint. Thus by Corollary 3.22, 


O((7172---Tk-1) Tk) = lem(0(7172...TR-1), O(TR)), 
and by the induction assumption, 

0(11T2...-Th-1) = lem(o(7)), 0(72).--, 0(7 — 1)). 
Finally, 


lem(0(7), 0(72), -.-, O(7%)) = lem(Iem(o(7), 0(72)..-, 0(7-1)), O(7%))- 


REMARK 3.25. The usefulness of the theorem is due to the fact that o(7;) is the length 
of Tj. 


DEFINITION 3.26. Let 7 = 1)72...T, be the decomposition of a non-trivial (meaning 7 4 
id) permutation 7 € S(n) as a product of disjoint cycles. Since the 7; commute, it involves 
no loss of generality to assume that 


o(m1) < o(t2) <... < o(t,). 


We call the k-tuple (0(71), 0(72), ...,0(7%)), the shape of 7. The identity permutation has the 
empty shape. Two permutations 7, and 72 € S(n) are conjugate if there exists a a € S(n) 
such that 72. = o7071. 
Conjugation is an equivalence relation on S(n). We verify that it satisfies the three 

required properties. 

e (Reflexivity) Every 7 € S(n) is conjugate to itself (t = idmid~'). 

e (Symmetry) If for 7, and m2 € S(n), m2 = om,0~' for some o € S(n), then 7, = 

i 
O10. 


80 3. GROUPS 


e (Transitivity) If for 7, m2 and 73 € S(n) there exist 0, and o2 € S(n) such that, 
Ty = 0170," and 73 = 021209 *, then 


i. T2205" — 020170, 05" = (0901)11(0201)*. 


Thus conjugation partitions S(n) into conjugacy classes. The conjugacy class of 7 € S(n) 
is the subset (of S(n)) 
Sa)" ={ona oa € S(n)}. 
It is easily seen that the conjugacy class of the identity consists only of one element (namely, 


id). 


THEOREM 3.27. The permutations 1, and m2 € S(n) are conjugate if and only if they 
have the same shape. 


PROOF. It is clear by the above remarks that we may assume that both 7, and 72 # 
id. We begin with some further general remarks about conjugation that help us understand 
what this operation means. For any three permutations 71, 72 and a € S(n), 


o(mm2)0~' = (oma~")(oT207~*); 


that is, conjugation (by the same permutation) preserves products. Next, if 7; sends i € X, 
to j and m2 = ono, then 72 sends o(7) to o(j); that is, conjugation corresponds to a 
relabeling of the elements of X,. Thus if 7 is the cycle (x1, £2, ...,7,), then 


1 =om0 ' = (o(21),0(22),.-.,0(2r)). 


We note that the cycles 7; and m2 have the same length. 
We are now ready to prove the theorem. We assume that 7, and 72 € S(n) are conjugate 
via the motion a. Assume that 


(11) T= Haak 
is the decomposition of 7, as a product of disjoint cycles. Then 
1 =o0710 ' =(oT,0~')(oT207*)...(oT,07*) 


is a decomposition of its conjugate as a product of disjoint cycles . Since 7; and o7;0~' have 
the same length, the only if part of the theorem has been established. To prove the if part of 
the theorem we assume that 7, and 72 have the same shape. Let (11) be the decomposition 
of 7, as a product of disjoint cycles. Since 72 has the same shape as 7, its decomposition 
as a product of disjoint cycles is given by 


=e ey / 


where we have the same number (k:) of disjoint cycles and each of the cycles 7} (of 72) has 
the same length as corresponding cycle 7; (of 7). Let (x1, 22,...,2,) be a typical cycle 7; 
and (21, 25,...,2/,) be the corresponding cycle 7/. We define a permutation o € S(n) by 
o(x;) = x; if x; appears in one of the cycles in the decomposition of 7. This definition 
makes sense since a fixed x; appears in at most one such cycle. We set o(a;) = xj, if x; does 
not appear in any of the cycles. It is easy to see that this indeed defines a permutation on 
n-letters and that it conjugates 7, to 7. 


We proceed to the definition of another invariant of a permutation, its sign. In the next 
definition we use formal expressions in variables (indeterminates) indexed by integers. 


2. THE ORDER AND SIGN OF A PERMUTATION 81 


DEFINITION 3.28. For n € Zs2, we define a polynomial A in the n indeterminates x1, 
D5 tates Pens DY 
NB iay Tn) = I] (x; —=;), 
1<i<j<n 
and for 7 € S(n), we define the polynomial 7A, by 
IS Dayo, iy Bee) SN Wai ea Oy) 
(The polynomial 7A is obtained from the polynomial A by replacing each appearance of 2; 
by #7(;).) The expressions (x; — 2;) appearing in the definition of A are called, its factors. 
They are transformed to factors (#,(:) — @7(j)) in 7A. 
EXAMPLE 3.29. For 7— 3 and —:(1,2,3) = (1,3)(1,2), 
A(x1, £2, £3) = (£1 — £2)(X1 — ©3)(T2 — @3) 
and 
nA(x1, 22, £3) = (£2 — £3)(2 — £1)(43 — 21) = (—1)?A(21, 20, 23). 
Note that for 7 and ao € S(n), 
(ma)A = 1(cA) 
since each side is obtained by replacing x; by 7,(q)). The next lemma is, at least at first 
glance, rather surprising. Its proof is also surprising; it is almost trivial. 
LEMMA 3.30. For each  € S(n), TA =A. 
PRooF. Both A and 7A have the same number of factors. Consider one of these factors 
(2; = 23) in A (thus 1 <7< 7 <-n). The corresponding factor in 7A is (275) =%5(;)). 
We can, of course, assert that 7(i) # 7(j) because 7 is a bijection of X,. Further, for the 
same reason, there exist unequal positive integers k and J, each < n, such that k = m(i) and 
1 =7(j). Thus either (a7(:) — £7(j)) is a factor of A (if k < 1) or —(&a() — a(j)) is a factor 
of A (if k > 1). Thus each factor of 7A is either plus or minus a factor of A. The result 
follows by collecting (multiplying) all the minus signs. 


DEFINITION 3.31. The sign of the permutation 7 € S(n), sgn(), whose value is +1, 
is defined by tA = sgn(a)A. The definition makes sense as a result of the last lemma. A 
permutation 7 is even if sgn(7) = 1 and odd otherwise. 


THEOREM 3.32. For 7 and o € S(n), 
sgn(7o) = sgn(7)sgn(c). 
PROOF. From the definitions, 
(ra )A = sgn(ma)A 


and 
m(aA) = sgn(7)oA = sgn(7)sgn(o)A. 


PROPOSITION 3.33. The sgn function satisfies the following properties. 
(1) sgn(id) = 1 
(2) For all x € S(n), sgn(7) = sgn(z71). 


82 3. GROUPS 


(3) For all t and o € S(n), sgn(omo~') = sgn(z). 
(4) Every transposition has sign —1. 
PROOF. Since idA = A, (1) follows. If 7 € S(n), then by the previous theorem and (1), 
sen(7)sgn(7 ') = sgn(r~') = sgn(id) = 1. 
So both sgn(7) and sen(7~') are either +1 or —1, establishing (2). By the previous theorem, 
for all 7 ando € S(n), 
sen(oto') = sgn(o)sen(m)sgn(o*) 

and since o and o~! have the same sign by (2), (3) follows. The transposition (1,2) clearly 
has sign —1 and since an arbitrary transposition has the same shape as (1, 2), it is conjugate 


to (1,2) by Theorem 3.27 and thus with sign —1 as a consequence of (3); finishing the proof 
of (4). 


EXAMPLE 3.34. Let A(n) denote the set of even permutations in S(n). Then 

id € A(n). 

If 7 and o € A(n), then 7a € A(n). 

If 7, then 77! € A(n). 

If t € A(n) and a € S(n), then ora~! € A(n). 

Since n > 2, (1,2) ¢ A(n) and so the inclusion A(n) C S(n) is proper. 

For n > 1, the map 7 +> (1,2)m sends even permutations bijectively onto odd 
permutations. So that 


n! 
|A(n)| = 5 fori. > 2. 
e In language to be subsequently introduced, A(n) is a normal subgroup of index two 
of S(n), and that for n > 1, 
sgn: S(n) > {+1} 
is a surjective homomorphism with kernel A(n). 
LEMMA 3.35. Every cycle 7 1s a product of transpositions. Further 
sgn(r) = (—1)™-1, 
PROOF. We easily check that 


aa geet Wie (Ries eg ape i) aca, CH, ey) og |: 


The lemma is an immediate consequence of this identity. 
The lemma and our previous results imply 


THEOREM 3.36. Every permutation is a product of transpositions. The number of trans- 
positions is even if and only if the permutation ts. 


EXAMPLE 3.37. We have been studying 
m = (1,3,5,7,9)(2, 4, 6)(8, 10) = (1,9)(1, 7)(1, 5)(1, 3) (2, 6)(2, 4)(8, 10) 


and thus 
na—* = (1,3)(1,5)(1, 7)(1, 9) (2, 4)(2, 6)(8, 10). 


3. DEFINITIONS AND MORE EXAMPLES OF GROUPS 83 


REMARK 3.38. The last identity is a consequence of the fact that if 
eee 
is a decomposition of 7 as a product of transpositions, then 
mT) = TeTp-1---T13 


which follows immediately from the fact that each 2-cycle is its own inverse. 


EXERCISES 
(1) What is |A(1)|? 
(2) Determine the order and sign of 
e (1, 2, 3, 4, 5)(10, 8, 6)(9, 11) 
e (1, 2, 3, 4, 5)(10, 5, 6)(9, 11) 
(3) Is every permutation of order 2 a transposition? 
(4) Let n > 2. Show that every transposition is a product of transpositions of the form 
(k,k+1),1<k<n-1. 
(5) Let n > 3. 
(a) Show that a product of two transposions in S(n) is also a product of 3-cycles. 
(b) Show that the elements of A(n) are products of 3-cycles. 
(c) Let t € A(n) be a k-cycle. Write 7 as a product of | 3-cycles. What is the 
minimum such |? 


3. Definitions and more examples of groups 


DEFINITION 3.39. Let X be a set. A binary operation or product on X is a map * : 
X x X — X; thus an assignment x * y € X to each ordered pair (x,y) € X x X. This 
property is also called closure of X under x. 


DEFINITION 3.40. A group (G,*) is a set G with a binary operation * on G with the 
following properties. 
e (Associativity) For all g, h and k € G, (g*h)*k=g*(hxk). (We say that the 
binary operation * on G is associative.) 
e (Existence of identity). There exists an identity element e € G such that e* g = 
gxe=g forallg EG. 


e (Existence of inverses) For each g € G, there exists an inverse g~' € G such that 
1 


gi *g=g*g" =e. 
DEFINITION 3.41. A group (G, *) is called abelian or commutative if g *h = hx g for all 
g and h € G. In this case we say that the binary operation * on G is commutative. 


NOTATION 3.42. Some of the standard conventions are the following. 


e We usually identify the group (G,*) with the set G, and say the group G when the 
corresponding binary operation * is understood from the context. 

e The binary operation * is many times written as - and also dropped from the notation 
completely. Thus xy, «xy andx-y all stand for the product of x and y (in that 
order) in a group G. 

e One never uses (interchangeably, for example) two different symbols (for example, 
*« and -) for the same binary operation. 


84 3. GROUPS 


e For commutative groups, the binary operation is usually written as + and called a 
sum (rather than product), and an inverse of g will be denoted by —g. 


The definitions have some immediate consequences. Among them is 
THEOREM 3.43. The identity and inverses in groups are unique. 


PrRooF. Let e and e’ be identity elements of a group G. Then 


The first equality uses the fact that e’ is an identity element; the second, that e is. Now let 
g € G and assume it has hf and k as inverses. Then 


h = he = h(gk) = (hg)k = ek =k. 


REMARK 3.44. For groups (G,-), the identity e is often written as 1; as 0, for abelian 
groups (G,+). The identity element e of (G,*) will be denoted by eg when we need to 
emphasize which identity (group) is needed. 


The next worksheet involves ideas from linear algebra and is preparation for an alternate 
discussion of Example (22) below. 


WORKSHEET #4 
Orthogonal affine transformations, a review. 


(1) Let n be a positive integer. In this exercise we review affine orthogonal transfor- 
mations of R"; with particular attention to the case n = 2. For this special case, 
all claims appearing below should be verified. One of the aims of this work 
sheet, is to explore the interplay between calculations and geometric ideas, between 
the Cartesian plane R? and the complex plane C. 

(2) Recall that an n x n real matrix A is orthogonal iff ATA =I. An affine orthogonal 
transformation is a self-map of R” defined by sending the column vector v € R” to 
the vector Av + a, where A is a fixed orthogonal n x n matrix and a € R” is fixed 
vector. 

(3) Show that the determinant of a real orthogonal n x n matrix A must be either +1 
or —1 by using the fact that for n x n matrices A and B, 


det AB = det A det B. 


(4) Consider the case n = 2 and the real orthogonal matrix A = : } | Conclude 
that the four real numbers a, b, c and d satisfy the three equations 


a +7=1, 


P+d%=1 
and 
ab + cd = 0. 


3. DEFINITIONS AND MORE EXAMPLES OF GROUPS 85 


(5) The next task is to solve (simultaneously) the last three equations. The first of 
these equations tells us that the point (a,c) € R? lies on the circle with center at 
the origin and radius 1; hence a = cos@ and c = sin@ for a unique real number 6 
with 0 <6 < 2r. 

Similarly the second equation tells us that b = cosy and d = sing for some 
unique real number y with 0 < y < 27. 

Conclude from the third equation that tan@tany = —1 and hence that y = 
6 + 5. Hence also conclude that 


oo bee ea ee ae Z| 


sin 0 cos 6 sin@ —cosé 


Note that these two cases correspond to the two different possibilities for the sign 
of the determinant of A. 


(6) Represent vectors in R? as columns X = | : | with x and y € R. The orthogonal 


matrix A acts on R? by sending the vector X to AX. In the two cases we have 
described we get 


x cos @ — ysin é 
xsiné + ycosé 


xcosé + ysin@ 


ac xsin@ —ycosé |’ 


| and AX = 
respectively. 

(7) A pair of real numbers (zx, y) can be represented in rectangular coordinates by the 
single complex number z = x + 1y. If z 4 0, it can also be represented in polar 
coordinates by re’, where r = ,/x2 + y? and 6 = sin! t= cos" } *. We can in this 
context think of e’ as a short hand form of cos @ + zsin 0. 

(8) In terms of complex numbers, our first map sends z = x + zy to 


(x cos0 — ysin 0) + (x sin @ + ycos@) = (cos@ — asin @)(x +1) = ez 


and in the second to 


(x cos @ + ysin 6) + 2(xsin@ — ycos@) = (cos + asin 0)(x — ay) = eZ = e- 8 z. 

(9) Geometrically, the first case corresponds to a clockwise rotation of C about the 
origin by an angle 0. The second case, to complex conjugation followed by a counter- 
clockwise rotation by an angle 6 or equivalently, a clockwise rotation by an angle 0 
followed by complex conjugation. 


(10) The analysis of the case n = 3 is similar, but requires (much) more work. 


EXAMPLES OF GROUPS 


EXAMPLE 3.45. We have already encountered several groups. We list these as well 
as some new groups and some non-groups. We have grouped the examples under several 
categories. The reader should verify the axioms for the various groups and find the reason 
why other examples are not groups. 


EXAMPLES BASED ON INTEGERS AND OTHER NUMBER SYSTEMS 
(1) (Z, +) is an abelian group (and |Z| = co). 


86 3. GROUPS 


(2) So is (nZ,+) for every integer n, where 
nZ = {jn;j € Z} 


(and |nZ| = oo). In language to be developed, nZ is a normal subgroup of Z. 
(3) (Zsa, +) is not a group for any a € Z since the set is not closed under inverses. 
(4) Neither is (Z,-). 


(5) For every positive integer n, (Z,,-+) is an abelian group and |Z,,| =n. Its identity 
element is [0},. 


(6) ({£1},-) is an abelian group with 2 elements. So is (Z2,+). As we shall see later 
these are the same groups. 


(7) For each n € Zs, the n roots of unity (these are complex numbers of the form 


en withke Z,0<k<n), Uy, form a commutative group of size n under multi- 
plication. 


(8) For n € Zso, (Zpn,+) is not a group, but it has an identity element [1],,.. However, 
(Z*,-) is a group with y(n) members. 


(9) The rationals (Q), the reals (R) and the complex numbers (C) are each infinite 
abelian groups under addition. If we remove the zero element from these (obtaining 
Q*, R* and C*), we obtain infinite abelian groups under multiplication. 


(10) Complex numbers of absolute value 1 form an infinite abelian group under multi- 
plication. 


(11) We can use the complex numbers to construct another finite abelian group C, under 
multiplication: 


C, = {£1,407} 
consisting of 4 elements. Its multiplication table is 


The multiplication table for C,. 


3. DEFINITIONS AND MORE EXAMPLES OF GROUPS 87 


—1 || -1 1| —2 2 Ih 
a 2} —2|—-l 1 


(12) We now explore a less familiar example: a number system H known as the quater- 
nions. We first consider three undefined new symbols 2, 7 and k. The set HI is to 
consist of expressions of the form a+bi+cjy+dk with a, bcandd € R. (Weare really 
considering 4 quantities 1, 2, 7 and x. The last formal sum is then al + bs +c¢7+4 dk.) 
If we view 1, 2, 7 and k& as basis elements of a real 4-dimensional vector spaces, we 
obtain the additive structure on the quaternions (H,+). In this structure 


(a1 + byt + 19 + dik) + (G21 + bgt + C27 + dok) 


= (a, + Gg)1 + (by + be) + (C1 + €2)9 + (dy + de)a. 


To obtain a product structure for the quaternions, we must merely describe how 

to multiply the the 4 basis elements and then let the usual rules of arithmetic take 

over. We want 1 to be the identity element under the multiplication. So the 9 

products among the other 3 basis elements must be specified. We require that 
v= P=K=-1lwy=k, = —K,IK =1,K) = —2,K1 = 7 and 1K = —7. 


Under these rules 


(ay1 + byt + C17 + dy) (21 + bet + co7 + dok) 


= (a1 a2 — byby — C12 — dydz)1 + (aybg + by ag + Cydy + dyc2)e 


+(a4C2 = bi dy + €jaq + dbz) + (ajdy + bio = Cyb5 + dyaz)k. 


We leave two questions for the reader to resolve. If we remove the zero element from 
HI, do we get a group under multiplication? An abelian group? The quaternions H 
contain a very interesting finite subset, the quaternion group consisting of 8 elements 


SI Fee sel es Dem ny Pes coh 


It is a tedious but routine matter to construct 


The multiplication table for H,. 


88 


(13) 


(15) 


3. GROUPS 


The entry in the i-th row, j-th column is the product of the i-th element with 
the j-th element (in this order). 


1|-1 a} —4 j| —7| KI|—K 


1 1) -1 a; —4 jy| —7| Kl —K 


—j\| -3 j|—K | «| 1 1| -2 a 
K K | —K jy| —3| 2 a|—l1 1 
—K || -K| KI —J J a} —2 P| = 


The entries in the above table are enough to convince us that all the group axioms 
except possibly associativity are satisfied. We will easily see that associativity holds 
too when we study Example (19), below. Is the group we have constructed abelian? 


GROUPS OF PERMUTATIONS 
We have seen that for every non-empty set X, the set of permutations of X, 
Perm(X), forms a group under composition. This group is finite if and only if 
|X| is finite and abelian if and only if |X| < 2. 


For every positive integer n, the sets S(n) and A(n) form groups under composition. 
In language to be established, A(n) is a normal subgroup of Sn). Here |S(n)| = n! 


n! 


and for n > 2, |A(n)| = 5. If n > 2, S(n) is not commutative; neither is A(n) for 
n> 3. 


If we choose any 7 € S(n), then the powers of 7 
<a >={m™"; me Z} 


form a group with o(7) elements. There are many more groups of permutations. 
For example, the four elements of S(n),n > 4, 


id, (1, 2)(3, 4), (1, 3)(2, 4), (1, 4)(2, 3) 


form, a group. The easiest way to verify the closure property is to construct 


The multiplication table for G. 


id] (1, 2)(3,4) | (,3)(2,4) | 42,3) 

id id] (1, 2)(3,4) | (,3)(2,4) | 4,3) 
(1,2)(3, 4) |] (1,2)(3, 4) id | (1,4)(2,3) | (1,3)(2,4) |; 
(1,3)(2, 4) |] (1,3)(2,4) | (1, 4)(2, 3) id | (1, 2)(3, 4) 
(1, 4)(2, 3) |] (1, 4)(2,3) | (1,3)(2,4) | (1,2)(3,4) id 


3. DEFINITIONS AND MORE EXAMPLES OF GROUPS 89 


We also note as a result of the last calculation that each element of G is its own 
inverse. Why is this not surprising? 


GROUPS OF MATRICES 

(16) Let n be a positive integer. Recall° that an n x n matrix is invertible if it has an 
inverse with respect to matrix multiplication. The set of invertible n x n matrices 
over the integers® (GL(n, Z)), rationals (GL(n, Q)), reals (GL(n, R)), and complex 
numbers (GL(n,C)) form a group under multiplication. The verification of the 
group axioms for these sets can be based on two facts from linear algebra. An n x n 
matrix with integer entries is invertible if and only if its determinant is +1; while 
in any of the other three cases, if and only if its determinant is 4 0. Note that we 
have the proper inclusions 

GL(n, Z) Cc GL(n, Q) C GL(n, R) C GL(n, C). 


The reader unfamiliar with elementary matrix theory should verify the group ax- 
ioms for n = 2. 


(17) Upper triangular invertible matrices form a group under multiplication. An n x n 
matrix A = [a;;] is upper triangular if a;; = 0 for all 7 > j. 


(18) Diagonal invertible matrices form a group under multiplication. An n x n matrix 
A = [a;;| is diagonal if a;; = 0 for all i # j. 


(19) Let us define two 2 x 2 matrices 


0 -1 2 O 
x=|3 p | andy =| ¢ Bal 


As usual we note by I the 2 x 2, in this case, identity matrix: I = ; : | . Simple 
calculations show that 
X*=Y*=-Tand XY =-YX. 
Define 
Ley, 
and calculate (once again) to see that 
Ba AVLV AS IRGY SX 7X =7 and XFS =EY 
Thus the 8 matrices 
fe AX PY, ste ¢ 
have the same multiplication as the quaternion group H, (Example (12), above), 
with +1 in Hp corresponding to +I in this example; +2, to +X; +7, to +Y and +k, 


>From linear algebra courses. 
®An nxn matrix A with integer entries may be invertible and still not belong to GL(n, Z). It belongs 
to GL(n, Z) if and only if so does A7+. 


90 


(22) 


3. GROUPS 


to +Z. Since we know that matrix multiplication is associative, we conclude that 
so is the multiplication in Hp. 

(ASIDE TO THOSE WHO REMEMBER THE CONCEPT OF A LINEAR 
MAP.) If we send the quaternion a + bz + c7 + dk (here a, b, c and d are real num- 
bers) to the 2 x 2 matrix aI + bX + cY + dZ, then we have obtained an injective 
linear map from the quaternions (viewed as a real vector space) H into the 2 x 2 
complex matrices, viewed as a real vector space. 


The set SL(2, Z) of 2 x 2 matrices with integer coefficient and determinant 1 forms 
a group under matrix multiplication. How does SL(2, Z) differ from GL(2, Z)? 


Let p be a prime. An example closely related to the last one is the set SL(2, Z,) 
of 2 x 2 matrices whose entries are mod p congruence classes of integers and whose 


determinant is [1],. Thus an element of SL(2,Z,) is a matrix A = fai th | 
Pp Pp 


with |a],[d], — [b],[c], = [1]». A number of routine calculations are needed to verify 
that SL(2, Z,) is a group under matrix multiplication. One shows, in particular that 


the inverse of the matrix A is A~! = a ae I We can view the elements 
~~“ Ip P 


of SL(2,Z,) as 2 x 2 integral matrices i j | with the integers a, b, c and d to 


be restricted to the values in {0,1,...,p —1} and replacing all results of calculations 
by the mod p equivalent integer from this set. But despite the use of this notation 
SL(2, Z,) is NOT a subgroup of SL(2, Z). 


GROUPS OF SYMMETRIES 
These groups arise as symmetries of a geometric shape F’; meaning orthogonal 
affine transformations of the plane R? or 3-space R® which leave invariant the fixed 
geometric figure F’. 


(Rigid motions of an equilateral triangle.) We start with an equilateral triangle T 
and label its three vertices 1, 2 and 3, say in counter-clockwise order, to enable us 
to keep track of the motions we discuss. The three altitudes of T meet in a point O. 
Observe that any symmetry of T must maps vertices of T to vertices and sides of 
T to sides. Furthermore every symmetry of T is completely described by its action 
on the vertices this triangle. 

The group of symmetries of T is often called D(3). We proceed to describe 
it in detail. We define the first motion p as counter-clockwise rotation about O 
through an angle of ar It is clear that this motion is a symmetry of T. It is 
completely described (as a motion preserving T) by its action on the vertices; hence 
by the permutation (1,2,3) € S(3) of these 3 points. A second motion R that 
we introduce is reflection in the perpendicular bisector of the side connecting the 
vertices 1 and 2 (this line passes through the mid-point of this side and the vertex 


3. DEFINITIONS AND MORE EXAMPLES OF GROUPS 91 


3). This motion is described by the transposition (1,2) € $(3). The figure T has, of 
course, many other symmetries. All of them will be described in terms of p, R and 
the identity map (id € S(3)) that we denote by e. We introduce a multiplication on 
the set of symmetries of T: if o and 7 are such symmetries, then o7 is defined as the 
symmetry 7 followed by the symmetry o. (There is a good reason for this choice. 
We are viewing symmetries as maps and hence multiplication should correspond to 
composition. As a bonus, it also corresponds to multiplication of permutations.) 
This multiplication is associative. It is clear that the inverse of a symmetry is 
again a symmetry; it undoes what the original symmetry did. Let us observe that 
p® = R? =e and start listing some of the symmetries we have: 


{e, p, p”, R, pR and p*R}. 


These six motions are distinct as can be seen by examining their action on the 
vertices of T. There can be no other symmetries since there are at most 6 permuta- 
tions of the vertices. Thus the last set coincides with D(3) and is hence closed under 
multiplication. The construction of the multiplication table of D(3) is simplified by 
the relations" 


p> =e = R* and p?R = Ro. 


The first two of these relations are obvious from the definitions. A geometric ar- 
gument proves the last one. The reader is invited to provide one. The impatient 
reader could consult [7, pg. 186] or read the similar argument in the next exam- 
ple and adopt it to the current situation. We illustrate the calculation involved by 
considering two cases: 


(pR)(pR) = p(Rp)R = p(p*R)R = p’R? = 
and 


(p°R)p” = (Rp)p” = Rp* = R. 


The multiplication table for D(3). 


€ ol -e R| pR|p?R 
€ ell pl 2oe4|~ Fel pRoekR 
p p p € pR p’R| R 
an e| pip h| Ri pki, 


R R|p?R| pR e| p p 
pR| pR| Ri p’R p ei 
eR eR| pR| Ri p? p € 


If we place our triangle T on a coordinate system (a copy of R? or C) with center 
at O such that the base (for definiteness, take the base to be the side joining vertices 
1 and 2) of T is parallel to the x-axis (the horizontal axis), then we can realize® the 


There are, of course, others. But all the relations in D(3) are consequences of these three. 
8As a consequence of the material in the last worksheet, for example. 


92 


3. GROUPS 
motions p and FR as orthogonal 2 x 2 matrices: 


ah. ows = 
p= va 2, | and R= 0 
2 


~ 2 


ee © 
——a 


It is easier if we think of these as motions of C: 
anv = 
zZree3 zand zh —Z. 


Even if we did not know how to derive these motions, we should easily be able 
to check that as self-maps of C or R? they do the right thing. To do so we may 


scale our triangle so that its vertices lie on the unit circle and vertex 3 has complex 
: me : : 2 Tre 
coordinates 2 = e2. Thus vertices 1 and 2 must have coordinates -3 — $e = 8's 


and v3 — St =e<s , respectively. Hence these two motions do act as expected on 


the vertices. 


(23) (Rigid motions of a square.) Place a square S in the complex plane with vertices at 


(13) 


(14) 


—1-—12 (labeled vertex 1), 1—2 (labeled 2), 1+2 (labeled 3) and —1 +2 (labeled 4). 
Hence the center of S is at the origin O of the plane. We define the rigid motion p 
as rotation about O through an angle of 5 (represented by the self map of C z + 22) 
and R as reflection in the perpendicular bisector of the edge joining the vertices 1 
and 2 (represented by z ++ —2z). The relations among these eight maps 

{e, p,p”, 0°, R, pR, p’R, p°R} 
are 

p' =e=R’ and p*R= Rp, 


as can easily be checked using the geometric interpretation of the symmetries. The 
multiplication table for the set of these 8 elements, that we call D(4), is easily 
calculated, using only these relations, to be 


The multiplication table for D(4). 


Re RR | -pR| se'| ap? || spi. ip 
PR pr|s Ree peel jes Ze] ae ee 
pR| eR] pR| Ri PR] | p 
prlpR lp eR | pr) oR @- eee 
The fact that these motions are closed under multiplication, as shown by the above 
table, proves that D(4) is a group. As in the case of the triangle, the motions p and 
R can be represents as 2 x 2 matrices: 


0 -1 AT 
p=|1 it ond R= | 0 a 


3. DEFINITIONS AND MORE EXAMPLES OF GROUPS 93 
and as (described above by the) self- maps of C: 
p:zrozand R: zh —z. 


These rigid motions can also be described as permutations of the vertices of S: 


Rigid motion | Permutation 
e€ id 
p (1, 2, 3, 4) 
| (,3)(2,4) 

p? (1,4, 3,2) |} 
R (1, 2) (8x4) 
Rp = p?R (1, 3) 
Rp? = pr (1, 4)(2, 3) 
Rp? = pR (2, 4) 


The second and and fifth lines of the above table determine the other six lines, of 
course. We see from the above table, that only 8 of the 24 permutations in 5S(4) 
land in the group we have called D(4). We claim that D(4) is the full group of 
rigid motions of S. There should be a reason why only 4 of the elements of S(4) 
correspond to motions of the square. To see why, consider the 6 lines joining the 
4 vertices of S. We label the (un-oriented) line joining the vertices a and b by ab. 
A rigid motion of S can send 13 to either 13 or 24, while an arbitrary permutation 
on 4 symbols can send 13 to any of the six lines. We conclude that D(4) is the full 


group of rigid motions of the cube. 


(24) (Rigid motions of a regular n-gon, n > 2.) A group can be defined by a set of 
generators (p (a different one in each case) and R (in some sense, the same in 
all cases) as in D(3) and D(4), the examples discussed above) subject to a set of 
relations satisfied by the generators (in the above two cases, (12) and (14)). To 
be specific, let n € Zs. We construct a group D(n), the dihedral n-group, on 
generators p and R subject to the relations 


po SS RP and p" R= Ro. 


These relations are sufficient to construct the multiplication table for the group (it 
has 2n elements). Geometrically the group represents the rigid motions of a regular 
n-gon (a regular n sided polygon). The definitions for n = 3 and 4 agree, of courses. 
with our earlier definitions of the groups D(3) and D(4), respectively. 


(25) (Rigid motions of a rectangle.) Let R be a rectangle which is not a square. To 
describe the symmetries of R, we note that every element of this group must also 
be a symmetry of S. So, we need to determine which of the eight elements of D(4), 
fix R. The motion p certainly does not. Only a little bit of thought is required to 
convince us that only the four motions 


e,p’,R,p’R 


3. GROUPS 


have the required property. The multiplication table for these motions is easily con- 
structed. Observe that for each rigid motion r of R, r? = e. 


(26) Groups can also be associated with the study of solutions of equations of algebraic 
equations. We will discuss some of these after we describe some additional algebraic 
structures in §1 of Chapter 7. 


EXERCISES 


(1) Let X be a non-empty set. When is the group Perm(X) cyclic (see Definition 4.15 
of Chapter 4)? 

(2) Two of the rigid motions of the equilateral triangle were described as motions of R? 
and then as motions of C. Describe the other 4 as motions of these vector spaces, 
and then construct the multiplication table for these 6 motions in the two models. 
Show that you obtained (after relabeling) once again the multiplication table for 
S(3) and D(3). 

(3) Verify that the multiplication (composition) table for the 8 self maps of C given 
in (13), where the maps p and R are defined by (15) is exactly the same as the 
multiplication table for D(4). 

(4) Verify that after relabeling of the elements, the multiplication table for D(4) coin- 
cides with that for 8 permutations considered in the MAPLE program in 81. 

(5) Use MAPLE or MATHEMATICA to construct the multiplication table for D(5). 

(6) Discuss the geometric realization of D(2). What is the underlying geometric shape, 
the regular 2-gon? Can you identify D(2) with another group? 

(7) Identify the group of rigid motions of a rectangle (that is not a square) with a group 
encountered before. 

(8) Explain why the rotation group of the octahedron is isomorphic to S(4). 

(9) What is the rotation group of the icosehedron? 


CHAPTER 4 
Group homomorphisms and isomorphisms. 


The first two sections of the chapter are devoted to basic group theory. In the third 
section, we begin the study of homomorphisms, maps between groups that preserve the 
group structure. The fourth section is devoted to the study of groups of small order. The 
final section continues the study of homomorphisms. 


1. Elementary group theory 


This section deals with some of the elementary foundational results in group theory. The 
discussion parallels and generalizes part of our discussion of permutation groups. 


THEOREM 4.1. Let a and b be elements in a group G. There exist unique elements x and 
y €G such that a = bx anda = yb. 


PROOF. It is easily seen that 2 = b~'a and y = ab"!. 


REMARK 4.2. If G is abelian, then x = y, of course. 


COROLLARY 4.3 (Cancellation law). If g, h and b belong to a group G and bg = bh, then 
g=h. Similarly, if gb = hb, then g = h. 


PROOF. The first assertion follows from the uniqueness of x in the theorem. But this 
seems to be a rather torturous way to obtain the conclusion, which follows by multiplying 
each side of bg = bh by b~! on the left. The proof of the second assertion is similar. 


COROLLARY 4.4. Let a and b be elements of a group G, then (b-!)~' = b and (ab)~' = 
ba 


Proor. Take a = e in the theorem and note that both b and (b~')~! solve e = bu!z. 
Again this assertion follows from the uniqueness of inverses as does the last claim in the 
statement of the corollary. 


The powers of an element g in a group G are defined exactly the way we defined the 
powers of a permutation 7 € S(n). We merely substitute g for each occurrence of 7 and G 
for each occurrence of S(n) in Definition 3.13 of Chapter 3. 


DEFINITION 4.5. Let g be an element of a group G. Set g® =e. For k € Zyo, define 
inductively g* = gg*~!. Also define g-* = (g71)¥*. 


PROPOSITION 4.6. Let g and h be elements in a group G and let r and s be integers. 
Then 


95 


96 4. GROUP HOMOMORPHISMS AND ISOMORPHISMS. 


PROOF. The required argument is identical to the one in the proof of Proposition 3.14 
of Chapter 3. 


DEFINITION 4.7. An element g in a group G has finite order if there exists a positive 
integer m such that g” = e and the order of g is the smallest such m; we say that g has 
infinite order (or its order is oo) if it does not have finite order. We let o(g) be the order of 
g. Thus o(g) is either a positive integer or oo. The number of elements |G] in a group G is 
also called its order, o(G). So, |G| = o(G) € Zso U {co}. 


REMARK 4.8. 1. If n is a positive integer, then every 7 € S(n) has finite order in the 
above sense and its order as defined above agrees with its order as a permutation as defined 
in Chapter 3. 


2. If the group G has finitely many elements (we shall say in this case that G is a fi- 
nite group), then every one of its members has finite order. 


ie 


3. The matrix A = E 1 


| has infinite order in the group SL(2,Z). Since for every 


: hn Seta 
integer n, A eit rh 


‘ : : 1 1 
4. As an element of the group SL(2, Z,), with p a prime, the matrix 0] | = 
has finite order p. 
We now come to another key idea; the concept of a substructure. 


DEFINITION 4.9. A non-empty subset H of a group (G,*) is a subgroup (of G) if it isa 
group under the binary operation * (restricted to H x H). It is a proper subgroup if it is 


BG. 


REMARK 4.10. 1. A subgroup H of G always contains the identity e € G. We verify 
this elementary fact. Since H is a group, it contains an identity element e’. Since H C G, 
e’ €G. Now ee’ = e’ because e’ € G and € is the identity in G, and e’e’ = e’ because e’ is 
the identity of H. Thus by the cancellation law (in G), e = e’. 


2. Every group G has at least one subgroup; namely the trivial subgroup with one ele- 
ment; the identity element of G. All groups that contain more than one element have a 
second subgroup; namely the group G itself. 


EXAMPLE 4.11. We have been discussing subgroups all along. 


(1) The set of even integers 2Z is a subgroup of (Z, +). 
(2) Each of the following set-theoretic inclusions are subgroup inclusions (in the first 
set of inclusions the group operation is addition; multiplication, in the second) 


ZCQCRCC 


and 


{HIV EO CR ce’. 
(3) SL(2, Z) is a subgroup of SL(2, R). 
(4) But for any prime p, SL(2, Z,) is not a subgroup of SL(2, Z). 


1. ELEMENTARY GROUP THEORY 97 


The next proposition gives easy tests for determining when susets of a group G are 
subgroups. 


PROPOSITION 4.12. Let H be a non-empty subset of a group G. The following conditions 
are equivalent: 
(a) H is a subgroup of G. 
(b) For allx andy € H, x7! and xy € H. 
(c) For allx andy € H, ry"! € H. 


PROooF. Assume for the moment that H is closed under the multiplication it inherits 
from G. Since the product operation is the same for H and G; the multiplication in H is 
certainly associative. So in addition to closure, to show that H is a subgroup of G, we need 
to show that H contains the identity e of G and that the inverse of every element in H 
belongs to H. We are now ready to show that (a) => (b) = (c) > (a). We start with (a). 
Hence (b) follows from the fact that H is a group. Now if (b) holds, then y~' € H and hence 
so is ry~. So (b) implies (c). Finally if (c) is true, then there exists an x € H (since it is 
non-empty) and by taking y = x, we see that e = xx~' € H. To see that the inverse of every 
element y € H belongs to H, take x = e. To see that A is closed under multiplication of x 
by y (with both in H), observe that we already know that y~! € A and thus x(y~')~! € H. 
But (y~!)~' = y. We conclude that (c) implies (a), 


The next two propositions provide methods for constructing subgroups of a given group. 
PROPOSITION 4.13. If H and K are subgroups of a group G, then sois HN K. 


ProoF. The set HM K is not empty since it contains e. Now if x and y © HM K, then 
these elements belong to both H and K and because these are subgroups, so does xy~!; that 
says zy (CHK. 


PROPOSITION 4.14. Let G be a group and x an element of order n in G. Then the 
distinct powers of x, 
<2 >= {2";m eZ} 
form a commutative subgroup of G containing n elements; called the cyclic subgroup of G 
generated by x. 


Proor. The set < x > is not empty since it contains e = x°. If r and s € Z and x” and 
x® €< x >, then so does (x")(x*)~1 = x". 


To apply the above concept to abstract groups, rather than just subgroups of a given 
group, we introduce the next 


DEFINITION 4.15. A group G is said to be cyclic with generator g if there exists an 
element g € G such that 
G={g";meZ}. 
In this case we write G =< g > to indicate that the group G is generated by the element g. 
In general, we write 
GS; Gs S 


to indicate that G is generated by the elements 91, go, ..., and 
G =< G1, G2, 5 Ry, Ro, igo 
to indicate that G is generated by the elements g), go, ... subject to the relations R,, Ro, .... 


98 4. GROUP HOMOMORPHISMS AND ISOMORPHISMS. 


EXERCISES 


(1) Let G bea group and g € G. If o(g) =n € Zyo, show that for all r € N, o(g") = aa 


(2) Is Z a group under subtraction? 
(3) Is the intersection of two cyclic subgroups of a group also cyclic? 


2. Lagrange’s theorem 


A remarkably simple way of decomposing groups will lead us to some surprisingly strong 
consequences. The key is a theorem due to Lagrange. 


DEFINITION 4.16. Let H be a subgroup of G and let a € G. We define a left coset (of 
H inG) 
aH = {ah;h € H}. 
A right coset Ha is defined similarly. 


REMARK 4.17. Several observations are in order. 

(1) We restrict all remarks, propositions, theorems and examples to left cosets. Similar 
statements can of course be made for right cosets. 

(2) Since H = eH, H is its own left coset. 

(3) Since a = ae, a € aH. 

(4) If b € aH, then bH =aH. Assume that b = ah, with h, € H. Then for all h € H, 
bh = ah,h € H and hence bH C aH. Conversely, for all h € H, ah = bh5'h € bH 
showing that aH C bH. 

(5) aH is a subgroup of G if and only if a € H. If a € A, then aH = H and thus aH 
is a subgroup. Conversely, if aH is a subgroup, then it contains e and thus e = ah 
for some h € H. Thusa=h7! € H. 

(6) If we take H = G, we see that there is only one coset of Gin G. If we take H = {e}, 
then we see that the coset of a{e} of {e} in G consists of the set with one element 
{a}. 

(7) Since a € aH, 

J aH =G. 
acéG 
(8) For commutative groups G, left and right cosets agree. 


EXAMPLE 4.18. We have already encountered some cosets and we should examine some 
new ones. 


(1) Let us fix a positive integer n. We know that nZ is a subgroup of (Z,+). Only a 
little thought is required to conclude that for all a € Z, 


at+nZ = [aln. 


Why are we using additive notation a +nZ for the left coset? Does it differ from a 
right coset nZ + b? 
(2) Let G = (Ze, +) and H = {[OJe, [3]e}, then 


[e+ A = {[Us, [4]et = [4]e + 
and in general 
[alo + H = [a+ 3]g +H, for all [alg € G. 
Thus there are 3 left cosets of H in G. 


2. LAGRANGE’S THEOREM 99 
(3) Let-G = $(3) and A = {id 25.3), (133; 2) Then 
ae 2) -_ {(1, 2), (2, 3), ar 3)}, 
and there are only 2 left cosets of H in G. See also the next Proposition. 


PROPOSITION 4.19. Let a and b be elements of a group G and let H be a subgroup of G. 
Then either aH =bH oraHNbH =9. 


ProoF. If aH NbH # 9, then it contains an element, c = ax = by, where x and y € H. 
Thus b = ary! and it follows that b € aH. By Item 4 of Remark 4.17, aH = bH. 


REMARK 4.20. A subgroup H of a group G introduces an equivalence relation R on G, 
where for x and y € G, «Ry if and only if y~'z € H. 


PROPOSITION 4.21. Let H be a subgroup of G and leta€ G. Then |H| = |aH |. 


PRooF. The map which sends h € H to ah € aH is a bijection. 


THEOREM 4.22 (Lagrange). If H is a subgroup of a finite group G, then |H| divides |G]. 


PROOF. The group G can be decomposed as a finite union of disjoint cosets: 


G= U a,H. 
i=1 


Hence |G| = m|H|. 


COROLLARY 4.23. Let g be an element of a finite group G, then o(g) divides |G. 


ProoFr. The observation that o(g) =| < g > | reduces the corollary to a special case of 
the theorem. 


DEFINITION 4.24. Let H be a subgroup of a finite group G. The index of H in G, |G: H], 
is defined as the number of distinct cosets of H in G. In this language, Lagrange’s theorem 
may be written as 


o(G) =[G: H]o(#). 


REMARK 4.25. Lagrange’s theorem (its last corollary, in a more strict sense) is a gener- 
alization of two of our earlier results. We show that these earlier results follow from our last 
corollary. 


e (Fermat) If p is a prime and a € Z is not a multiple of p, then a?~'= 1 mod p. 


PROOF. We use the group (ZF, -). It contains p—1 elements and since it contains 
[a]p, 0([a],,) divides p — 1. oO 


e (Euler) For every positive integer n and all integers a that are relatively prime to 
n, a? =1 mod n. 


PROOF. We repeat the argument used above to prove Fermat. The group now 
is (Z*,-). It contains y(n) elements and since [a], has a multiplicative inverse in 
Zn, it belongs to Z*. As before, o({a],) divides the number of elements in the group 
Zn p(n). 


e We see once again that Euler is a generalization of Fermat. 


100 4. GROUP HOMOMORPHISMS AND ISOMORPHISMS. 


3. Homomorphisms 


DEFINITION 4.26. A map 6: G — H between groups is a homomorphism if 
O(xy) = O(x)O(y) for all z andy €G. 


(That is, if it preserves the group structure. Multiplication on the left hand side of the 
last equation is in the group G; while the multiplication on the right hand side is in H.) 
The map @ is an isomorphism if it is also a bijection. In this case, 0-! : H — G is also an 
isomorphism. The groups G and H are isomorphic if there exists an isomorphism 6: G — H 
between them and we then write G = H. 


PROPOSITION 4.27. If 0: G — H is a homomorphism, then 0(e) = e and for allx € G, 
O(a *) = (A(x))*. 

PROOF. In the first claim, the first e is the identity eg of G and the second, ey, of H, 
of course. To establish it, we note that 0(eg) = 6(egeg) = O(ec)O(eq). If we multiply both 


sides of the last equation (ignoring the middle term) by 6(eg)~! on either the left or the 
right, we conclude that e7 = 0(eq). For the second claim, we note that 


en = O(eg) = O(ux*) = O(x) O(a") = A(x) (A(x) 


and 


ex = 0(eq) = O(x x) = O(x~")0(x) = (O(x))~*0(2). 


EXERCISE 4.28. Let n be a positive integer. Reduction mod n is the map 
red, : Z— Z,, 


that sends each an integer m to its congruence class modulo n. It is obviously a surjective 
group homomorphism with respect to the respective additive (abelian ) group structures on 
Z and Z,,. 


PROPOSITION 4.29. Let G be a group where every element other than e has order 2. 
Then G is abelian . 


1 


PrRooF. The hypothesis guarantees that for all 7 € G, «~* = x. Hence for all y and 


xEG, 
1 1 ese) 


(ag) = sh ene (ay) Se Se 


THEOREM 4.30. Let n € Zyo. Every cyclic group of order n is isomorphic to (Zp, +). 


PROOF. If g is the generator of a cyclic group G of order n, the isomporphism of G onto 
Z,, sends g to 1. 


EXERCISES 


(1) Show that the relation R introduced in Remark 4.20 is an equivalence relation and 
then verify that for each x € G, the equivalence class 


[z] = {y € G;yRa} 


is the same as the left coset «H. 


4. GROUPS OF SMALL ORDER 101 


(2) Show that for each positive integer n, (Z,,-+) is a cyclic group of order n and that 
it is isomorphic to (U,,,-) and to the group of permutations < (1,2,3,...,n) >. 

(3) Let n be a positive integer. Show that any two cyclic groups of order n are iso- 
morphic. As a result of this fact, we use the symbol Z,, or Z,, =< 1 > to denote 
such a group when we use additive notation and U,, or U, =< en: > when we use 
multiplicative notation; in each case the second form also describes the generator of 
the group. 

(4) Let (G,-) be a group. 

e Show that the map that sends x € G to its square x? = x-x is a homomorphism 
of G into itself if and only if the group is abelian. 

e Conclude that for abelian groups the elements that are their own inverses and 
the elements that are squares are each subgroups of G, H and K, respectively. 

e Are either of the last two statements true for non abelian groups? 

e What is the intersection of H and kK? 


4. Groups of small order 


Let G be a finite group with n elements. In this section we describe all such groups with 
n <8. We need some preliminary results that we proceed to establish. The first of these is 
the beginning of the classification theory of finite groups. 


THEOREM 4.31. A finite group G of prime order p is cyclic. 
PrRoor. Lete #g€G. Then 
1 < o(g) | 0G) =p, 


and thus o(g) = p. It follows that < g > is a subgroup of G of order p and hence the 
inclusion of < g; g? =e > into G is an isomorphism. O 


REMARK 4.32. (Z,,+) is a good model (representative) for the isomorphism class of 
cyclic groups of order p. The convenient generator for (Z,,+) is [1], although [a], will do as 
long asa € Z— pZ. 


THEOREM 4.33. Let G be a group and a and b two of its members. Assume that a has 
finite order n > 1 and that b? = a. If n is odd assume further that b €< a >. Then 
o(b) = 2o(a). 


PrRooF. If n = 2, then b €< a >. For if b €< a >, then b = e or b =a. Both of these 
possibilities contradict the fact that b? = a. 

Now b+ = a? = e. Thus o(b)|4 and the only possibilities are o(b) = 1, 2 or 4. The first 
of these implies that b = e which is impossible. The second implies that b? = e; which 
is also impossible since it would say that a = e. We conclude that o(b) = 4. We have 
established the theorem if n = 2. So assume that n > 2. We show first that if n is even, then 
(automatically) b ¢< a >. For if b = a” with 2 < r < (n—1) (note that as before, b # e 
and b #a), then a = b? = a*". Thus e = a”! and n|(2r — 1). Since 3 < 2r —1 < 2n—3, 
r =n and it cannot be that b? = a. So in all cases, b ¢< a >. Now bb?” = a” = e€ and 
thus 0(b)|2n. Let s = 0(b) (hence b* = e). Thus a’ = 6? = e and hence n|s. We claim that 
s #n. This claim would imply that s must be at least 2n and hence = 2n. To verify the 
last claim, we assume (for contradiction) that s =n. If n is even (remember it is > 4), then 


102 4. GROUP HOMOMORPHISMS AND ISOMORPHISMS. 


a? = b"” =e which contradicts the fact that a has order n. If n is odd (remember it is > 3), 
then a2 = b"t! = b which contradicts the fact that b ¢<a>. 


REMARK 4.34. If n is odd, then the assumption that b ¢< a > is needed for the conclusion 
to hold. For in this case, we can choose b = a’ and observe that 


n+1 


b? = a and o(b) = o(a 
DEFINITION 4.35. Let (G, *,) and (H, *2) be groups. We introduce a binary operation * 
on the direct product, G x H, of G and H by the rule 
(91, h1) * (go, he) = (91 *1 ge, hi *2 he), for gi and go € G,h, and hg € H. 
PROPOSITION 4.36. If G and H are groups, so is G x H. For finite groups G and H, 
IG x H|=|G| |Al. 


PROOF. The group axioms are easily verified for G x H. For example, the identity for 
G x H is (eg, ex) (which will be written as e = (e,e)) and the inverse of (a,b) € G x H is 
(abe 


THEOREM 4.37. [fn and m are relatively prime positive integers, then 
Dig Se Die — Diincs 


PROOF. We make a few observation that should help the reader provide a proof of this 
result. Let a be a generator for Z, and b for Z,,. Thus o(a) = n and o(b) = m. It follows 
that! nm(a,b) = (m(na),n(bm)) = 0. Thus o(a,b)|nm. Since for every positive integer k, 
k(a,b) = (ka, kb) we conclude that o(a,b) is a multiple of o(a) and o(b) and since these are 
relatively prime, a multiple of their product. 


REMARK 4.38. The above theorem is a special case of the Chinese remainder theorem ; 
see 5.40 which contains a proof of the above version. 


EXAMPLE 4.39. The hypothesis in the last theorem that n and m be relatively prime is 
necessary. To see this we construct the 


ADDITION TABLE FOR Z, x Zo 
which we write additively since the group is abelian 


07 ,0)| 2,0 16,0) /0,))/0,01/2,)1/G) 

0 01,0) | (2,0) | (3,0)]/0,1)/0,1)/2,)1G6,) 
(1,0) | G, 0) | (2,0) | (3,0) 01 (1,1) | (2,1) / (3,1) | (0,1) 
(2,0) 1(2,0)13,0)) 010,)/@2D/3,D/0D/GD 
(3,0) 1 (3,0)|  0/7,0)/2,0)13,0/0,)/G,D1@,1 | 
ODIODIGHI2)1B,D) 0/1,0)1@,0) 1,0) 
DIDI 2,1/ 6,1) 10,1) /0)/2,0)1B8,0, 0 
(2,1) 12,1 1,1) | 0,1) 1,1) /@,0)/@,0)| 01,0) 
3,013,011) /0,)1/@1/G,0)| 01,0)/@,0) 


!We use additive notation because all the groups under consideration in this argument are abelian. 


4. GROUPS OF SMALL ORDER 103 


Using the table, we compute the orders of the elements of the group Z4 x Zo. 


element || 0 | (1,0) | (2,0) | (3,0) | (0,1) | @,1) | (2,1) | (3,1) 
order || 1 4 2 4 2 4 2 4] 


If Z4 x Zz were cyclic, it would have order 8 (thus Zg) and hence contain an element of order 
8. But none of its members have this order. 


We proceed to describe all groups of order < 8. We should keep in mind that Z,, x Zm 
has order < 8 as long as nm < 8. 


4.1. |G| =1. In this case G =< e >= {e}. 


4.2. |G| = 2, 3, 5, 7 and, in fact, all primes. By Theorem 4.31, for each prime p 
there is only one (cyclic) group (up to isomorphisms) of size p (namely, Z,). 


4.3. |G| = 4. If G has order 4, then its nontrivial elements can only have orders 4 and 
2. If G has an element a of order 4, then it is cyclic and isomorphic to Z4. Otherwise all its 
elements, other than e, are of order 2 by Lagrange’s theorem. By Proposition 4.29, G must 
be abelian. Choosing two distinct elements a and b in G of order 2, we conclude that 


G = {a,b; a =e=0", ab= bal}, 


and thus isomorphic to Zz x Zo. 


4.4. |G| = 6. If G contains an element of order 6, then it is isomorphic to (Zg,+). By 
Lagrange’s theorem, the only other possibility is for all elements of G other that e to have 
orders 2 or 3. 

So assume that G has no element of order 6. If G were also not to have an element of 
order 3, then it would have to be an abelian group by Proposition 4.29. Let a and b be two 
distinct elements of G of order 2. Then {a,b; a? = e = b*, ab = ba} would be a subgroup of 
G of order 4, contradicting Lagrange’ theorem. 

We conclude that G has an element a of order 3 and thus H = {e, a, a7} is a subgroup of 
G. Let b € G—H. We consider the 6 elements of G: {e, a, a’, b, ba, ba}; the first 4 of these 
are certainly distinct. If ba = b, then a = e, and if ba = a” for some r € Z, then b = a1. 
We conclude that the first 5 elements in our last list are distinct. If ba? = ba* for s = 1 or 
0, then a?~* = e which is impossible. Similarly if ba? = a” for some r € Z, then b = a". 
Thus the 6 elements in our list are distinct (hence this is the complete list of members of G) 
and we need only establish the multiplication table for these elements. The element b must 
have order 2 or 3. Let us try to compute b?. If b? # e, it would have to be a” with r = 1 or 
2 or ba® with s = 0, 1 or 2. If b? = a, then b has order 6; a contradiction. If b? = a? # e, 
then b cannot have order 2. Also b? = ba? # e; that is, b cannot have order 3; the last 
two statements yield a contradiction. Thus b? 4 a”. If b? = ba’, then b = a’; which is also 
impossible. We have reached the conclusion that b has order 2. Let us see what we know at 


104 4. GROUP HOMOMORPHISMS AND ISOMORPHISMS. 


this point about the multiplication table for G. 


e€ 27 b | ba | ba? 
e€ e| a 27 b | ba | ba? 
a Ga .G- e€ 
a ||| @- || -e@ |) “a 
b b| bal ba? lel] al @ 
ba || ba | ba? b 
ba? || ba? |b | ba 


We need to compute ab. There are only three possibilities ab = ba” with r = 0, 1 or 2. The 
first of these, ab = b, is impossible because it would imply that a = e. The second, ab = ba, 
would tell us that (ab)* = a*b® for all integers s and we would conclude that ab has order 6 
(remember the only possibilities are 2, 3 and 6). Thus ab = ba?. We now easily complete 
the multiplication table for G. 


e€ a2 | b| ba| ba? 

e€ e€ 2 b| ba | ba? 

all al a*| e|ba2| 6b] ba 
a || a e| al ba| ba? b 
b b| ba | ba? el al a 
ba || ba| ba? | b| a?] e| a 
ba? || ba? | 6b] bal al a?| e 


A comparison of the above multiplication with the one for S(3) shows that the group G is 
isomorphic to $(3). An isomorphism @ : G — S(3) can be chosen to satisfy 


O(a) = (1, 2,3) and 6(b) = (1, 2). 
It follows that 


6(e) = id, O(a) = (12,3), 6(a2) = (1,2,3)(1,2,3) = (1 
6(b) = (1,2), 0(ba) = (1,2)(1,2,3) = (2,3), O(ba2) = (1,2)(1,3,2) = (1 


We have shown that a group of order 6 is isomorphic to either Zg or S(3). 
4.5. |G| = 8. We will see that in this case there are 5 groups up to isomorphisms: 3 


abelian groups (Zg, Z4 x Z2 and Zp x Zy x Z_) and 2 non-commutative groups (D(4) and 
H,) 


Let G be a group of order 8. If G contains an element g of order 8, then G =< g > and 
is hence isomorphic to Zg. By Lagrange’s theorem the only other possibility is for all the 
nontrivial (~ e) elements of G to have orders 4 or 2. 

We now assume that G does not contain an element of order 8 and assume for the moment 
that G is abelian . There are two cases to consider. 
(a) If G has an element a of order 4, then H =< e,a,a?,a® > is a subgroup of order 4 of G 
and we may choose an element c € G—H. Lagrange’s theorem now tells us that G = HUcH. 
The element c has order 4 or order 2. If o(c) = 2, let b = c. If o(c) = 4, then o(c?) = 2. 
If c? € G —H, then we let b = c?. If c? € H, then because it has order 2, c? = a? and it 
follows that o(ca) = 2 and we let b = ca. We have shown that G contains an element b ¢ H 


4. GROUPS OF SMALL ORDER 105 


of order 2. We thus conclude that G = H UbH. Because the group G is commutative, we 
have enough information to complete its multiplication table 
e| al a| @& b] ba | ba? | bab 
e€ | oa.) ae | a b] ba | ba? | bab 
a i Gell aae e| ba| ba? | ba? b 
a | <a? | ae e| al ba? | ba? b| ba 
a® || a®| e| al a*|ba®| 6b bal ba? 
b] 6b] bal ba? | ba? | e| al a*|] a 
ba || ba | ba? | bak ba. a a8 e€ 
ba? || ba? | ba? b || bal a? a e€ 
ba® |] ba? b| ba] ba? | a e€ a? 


An analysis of the table shows that G is isomorphic to Z4 x Z; the isomorphism @ may be 
chosen to satisfy 
O(a) = (1,0) and 0(b) = (0,1). 


(b) We are left with the possibility that every nontrivial element of G has order 2. Let 
us choose two distinct elements of order 2 in G: a and b. We have already seen that the 
subgroup H = {e,a,b, ab} of G of order 4 is isomorphic to Zz x Zz (the case |G] = 4). So G 
must contain another element c of order 2 and G = H UcH. It now easy to construct the 
multiplication table for the group and conclude that it is isomorphic to Zo x Zz x Zo. 

It remains to consider non-abelian groups G of order 8. Such groups must contain 
elements of order 4 and choosing such an element a, we conclude that H =< e,a,a?,a® > is 
a subgroup of order 4 of G. We next choose an element b € G—H. Then o(b) = 4 or 0(b) = 2. 
In either case G = H U bH and ab # ba; for if ab = ba, then (a")(a*) = (a*)(a") = a"t’, 
(a")(ba)* = (ba)*(a") = a”**b® and (ba)"(ba)® = (ba)*(ba)” = a’t*b"** for all nonnegative 
integers r and s and G would be abelian . We consider separately the two cases. 

(a) We first study the case where all the elements of G — H have order 4. We need to 
compute b? and ab. Since b? has order 2 and such elements can be found only in H, we 
conclude that b? = a?. We know that ab ¥ a” (for all r € N) and ab ¥ ba. Certainly, ab 4 b 
Thus ab = ba” with r = 2 or 3. We show that r = 2 cannot occur. For if ab = ba”, then 
(ab)? = (ab)(ba”) = a and since ab € H, it must have order 8. This would imply that G is 
cyclic. Thus we are left with a group G generated by two elements a and b subject to the 
relations 

a’ =e, b? =a’ and ab = ba’. 
This suffices to construct the multiplication table for the group. 


a*| a?| b| bal ba? | ba? 


é-\f. ae 
e€ él) alae) ae b] ba | ba? | ba? 

a a cae [98 e | ba? b| ba | ba? 
all cael a8 e| a| ba? | ba? b| ba 
a | a? e| al a?| bal ba? | ba? b |. 

b b| ba | ba? | ba? | a? | ab e| a 
ba || ba | ba? | ba? bal ae | hae |e e€ 

ba? || ba? | ba? b| ba Ale oe ale 
ba® || ba? b| ba | ba? | a e| al a 


106 4. GROUP HOMOMORPHISMS AND ISOMORPHISMS. 


An analysis of the table shows that G is isomorphic to the quaternion group H,; an isomor- 
phism @ being defined by 
O(a) =2 and 6(b) = —3. 

(b) We are left to consider the possibility of the existence of an element b € G — H with 
o(b) = 2. We need only evaluate ab. As before ab = ba” with r = 2 or 3. Again, the case 
r = 2 is impossible; for if ab = ba”, then (ab)? = (ab)(ba”) = a? and we once again would be 
able to conclude that G is abelian. So in this case, the group is generated by two elements 
a and 6 subject to the relations 


a’ = e,b? =e and ab = ba’. 


This suffices to conclude that the group is isomorphic to the group of symmetries of the 
square D(4). 


EXERCISES 

(1) Construct an isomorphism from $(3) to D(3). 

(2) The group Z3 x Z, has order 6. Hence it is isomorphic to either Z or S(3). Which 
is it? Construct the isomorphism. 

(3) Let G be the group of order 8 with the property that all its elements other than 
e have order 2. Compute its multiplication table and hence show that there is an 
isomorphism of G onto Zz x Zy x Zo. 

(4) Describe all possible groups of order 9. 

(5) Describe all the subgroups of (4). Which of these are isomorphic? 

(6) Let n be a positive integer. Describe the set of generators for (Z,, +). 

(7) Prove Theorem 4.37. 


5. Homomorphisms and quotients 


Homomorphisms and isomorphisms between groups @ : G — H were defined (Definition 
4.26) previously. An injective homomorphism is also called a monomorphism, and a sur- 
jective homomorphism, an epimorphism. Thus a homomorphism is an isomorphism if and 
only if it is both a monomorphism and an epimorphism. Homomorphisms preserve much of 
group structure; while isomorphisms are essentially relabellings of the “same” groups. An 
isomorphism of a group onto itself will be called an automorphism of the group. 


DEFINITION 4.40. Let G be a group. A subgroup H C G is normal if gHg~' C H for all 
g © G (that is, ghg~! € H for all g € G and all h € H). Two elements f and f’ of a group 
G are conjugate if there exists a g € G such that f’ = gfg™t. 


REMARK 4.41. The condition gHg~! C H is, of course equivalent to H C g~'Hg. Hence 
H is a normal subgroup of G if and only if gHg~' = H for all g € G. 


PROPOSITION 4.42. Let 9: G— H be a homomorphism between groups, then 0~'(e) is 
a normal subgroup of G called the kernel of 0 (in symbols ker(@)). The image of 0, 


Im(@) = {h € H;h = @(g) for some g € G} 
is a subgroup of H. 
PROOF. We know that e € ker(@) and if g; and gp € ker(0), then 
4(9192') = (gn) (A(g2)) | =e. 


5. HOMOMORPHISMS AND QUOTIENTS 107 
Thus also gig) € ker(@). Hence ker(@) is a subgroup of G. If g € G and h € ker(6), then 
0 (ghg"') = 0(g)0(h)8 (g-*) = A(g) € (A(g))"' =e, 


and thus ker(@) is a normal subgroup of G. 
Certainly ey € Im(@). If hy and hy € Im(6), then for i = 1 and 2, there exist g; € G such 
that h; = 0(g;). Hence 


hhz' = 6(g)0(g2") € Im(9), 
and Im(6) is a subgroup of H. O 


REMARK 4.43. e In general, for a fixed element g in a group G and a fixed sub- 
group H CG, the map 
0,: ht ghg 
is an isomorphism of H onto the subgroup gHg~! C G, called conjugation (by q). 
PRooF. First we observe that gHg™! 
G. We note that for all hy and ho € H, 


(ghig”')(ghog”*)~* = ghig-ghy'g~* = g(hihz')g” € gHg™'; 


thus gHg~' is a subgroup of G. The map @ preserves multiplication since for h; 
and hg € H, 


O(hyh2) = g(hih2)g™' = ghig™'ghog' = (ghig™')(ghog~') = 8(h1)0(h2). 


If ghg-' =h’, then h = g~'h’g~!. Thus 0(h) = e implies that h = e. It is clear that 
OH) =gHg™. 


is a subgroup (it certainly is a subset) of 


e We say that two subgroups H; and H2 C G are conjugate if there exists ag € G 
such that Hy = gH,g"t. 

e Let H be a subgroup of the group G. Then H is a normal if and only if gH = Hg 
for all g € G. Thus for a normal subgroup H, left and right cosets coincide. 


PRooF. Assume that H is normal in G. Fix g € G. Define a map 0: gH — Hg 
by 0(gh) = hg, h € H. This map is injective since hig = hog for h; and hg € H 
implies that ghigg~' = ghogg™' or gh, = ghz. It is obviously surjective. Conversely, 
if gH = Hg for all g € G, then gHg"! = H for all g € G and hence H is a normal 
subgroup of G. 


e Every non-trivial group has at least two normal subgroups: the group itself and its 
trivial subgroup. 

e All subgroups of an abelian group are normal. 

e Every group of prime order has precisely two distinct subgroups; each is normal. 

e The cyclic subgroup < (1,2) > of S(3) is NOT normal since 


(T2a) LI 3, 2)= (2.5) 


The reason for introducing the concept of normality is explained by the next proposition 
and remark. 


108 4. GROUP HOMOMORPHISMS AND ISOMORPHISMS. 


PROPOSITION 4.44. Let H be a normal subgroup of the group G. The set of cosets 
G/H = {9H; 9 € G} 
has a natural group structure by defining the product 


(mn H)(g2H) = (gig2)H for gi and gz. €G. 


The coset H = eH is the identity element of this group known as the quotient group G 
modulo H. For all g € G, the coset g~'H is the inverse of the coset gH. 


PROOF. We are using left cosets; we could, of course use right cosets. The only issue 
is whether or not the multiplication is well defined. (Convince yourself that all the group 
axioms do indeed hold.) So what we have to prove is that if g,H = gH and gH = g}H, 
then (9192)H = (g)95)H. The facts that g,H = gH and gH = gH tell us, using only 
that H is a subgroup, that gig;' and ghg;' € H. This alone is not enought to conclude 


that (g/.95)(9192)~! = 919592 9, € H. We must use the fact that H is normal in G. Since 


999! € H and gi € G we conclude that g/(9595')g | € H. Next because H is a group (we 
do not need normality for this step) and gig; ' € H, we also see that g/,95gy'g,‘gig;' € H 
191 19292 91 N91 


as required. 


REMARK 4.45. Let H be a normal subgroup of the group G. 


(1) The map that sends g € G to the coset gH is a surjective homomorphism of G onto 
G/H with kernel H. We call it the canonical homomorhism of G onto G/H. 

(2) The simplest example shows that the normality assumption is needed. We use 
the fact that H =< (1,2) > is not a normal subgroup of G = S(3) to show 
that multiplication on $(3)/ < (1,2) > is not well defined. We try to multiply 
(1,3)H with (2,3)H; the result should be (1,3,2)H if were to use (1,3) as the 
representative for (1,3)H and (2,3) for (2,3)H, then we do get (1,3,2) as the rep- 
resentative for (1,3,2)H. But we can use (1,3) as the representative for (1,3)H 
and (2,3)(1,2) = (1,3,2) as representative for (2,3)H. If multiplication were well 
defined then ((1,3)H)((1,3,2)H) = (2,3)H = (1,3,2)H; from which we would 
conclude the false statement that (1,3,2)(2,3) = (1,3) € H. 

(3) For all g € G, the conjugation @, is an automorphism of H. 

(4) For commutative groups (G, +) the the multiplicative coset notation gH is replaced 
by the additive notation g + H. In this case G/H is also commutative. 


REMARK 4.46. We discuss several examples fo group homomorphosms. 


(1) The map the sends the complex number z to its absolute value |z| is a homomorhism 
from (C*,-) onto (Rso,-) whose kernel consists of the complex numbers of absolute 
value 1. 

(2) The exponential map that sends « € (R,+) to e® € (Rso,-) is an isomorphism 
whose inverse is the logarithm map. 

(3) More complicated is the complex exponential map that sends z € (C,+) to (C*,-). 
It is a surjective homomorphism with kernel 277Z C C. 

(4) Let G be any group andaeé G. Left translation by a, T,, is defined by 


P(e) = ar tor e-e-G. 


5. HOMOMORPHISMS AND QUOTIENTS 109 


Then 7, € Perm (G) and T defines an injective homomorphism from G to Perm 
(G). See the next section (Cayley’s theorem) for a more complete discussion. 

(5) Specialize the above situation with the two dimensional real vector space G = 

(R?,+)=RxR. 

(6) We construct one more group of homomorphisms (actually, a group of automor- 
phisms). Let G be a group and a € G. Conjugation by a, oq, is defined by 


o,(z) = axa” for x EG. 


Then o, € Perm (G) and since for all a, b and z € G, 

o,(o,(x)) = abrb-'a", 
the map o that sends a € G toa, € Perm (G) is a group homomorphism. What is 
its kernel? 


DEFINITION 4.47. An exact sequence of groups is a (perhaps infinite) collection of groups 
{G;} and homomorphisms 6; : G; > Gi41: 


6-4 6 0 02 03 
ws CF Gy Ge “Ge Gs seey 


where 

Im (6;) = ker (6:41), or alternatively 

the composite homomorphism 6;4;0; : G; — Gi42 is the trivial homomorphism (it sends 
every element of G; to the identity element of G;,2). 


DEFINITION 4.48. A short exact sequence of groups is a diagram (a special case of the 
last definition that is most useful): 


(eb Gs Gy Ge as eel 


where 

the G; for: = 1, 2, 3 are groups, 

§; and 2 are group homomorphisms, 
0, is a monomorphism (injective), 
Im (6,) = ker (82), and 

6, is an epimorphism (surjective). 


We have seen that for every normal subgroup H of a group G, 
{el - H +> G—G/H — {e} 


is a short exact sequence. 
EXERCISES 


(1) Let & be the group of complex numbers of absolute value 1 under mutiplication. 
Show that 0(2) = e?”* defines a homomorphism of (R,+) onto U. What is the 
kernel of this homomorphism? 

(2) Show that R/Z is isomorphic to U. 

(3) Do the powers of a three cycle in $(3) form a normal subgroup of $(3)? 

(4) Describe the kernel of the homomorphism co. 


110 4. GROUP HOMOMORPHISMS AND ISOMORPHISMS. 


6. Isomorphisms 


In the abstract study of groups, a group and its isomorphic image are usually indistin- 
guishable. We already saw, for example, that for each positive integer n, there is up to 
isomorphisms but one cyclic group Z,, of order n, that Z, x Zm = Zam provided (n,m) = is 
and that D(3) = S(3). One of the principal aims of this section is to properly place this last 
isomorphism within a more general theory as is done in the first subsection of this section. 


6.1. Every group is a subgroup of a permutation group. 
THEOREM 4.49. (Cayley) Every group is isomorphic to a group of permutations. 


PROOF. Let G be a group. We must find a set X and a group G* of permutations of X 
such that G = G* (thus we will have shown that G is isomorphic to a subgroup of Perm(X)). 
There is very little besides G to work with. We set X = G. For g € G, we define a self-map 
L, of G by 

1a(t) = oh,9eG 
(L, stands for multiplication on the left by g. It is obvious that L, is a self-map of the set 
G. It is one-to-one since for 7; and x2 € G, gx; = g%X2 implies that 7; = xo. The map 
is onto since for all y € G, g ty € G and L,(g"ty) = y. Thus L, € Perm(G). We let 
G* = {L,;9 € G}. Obviously G* C Perm(G). Since Perm(G) is a group under composition, 
composition is a binary operation on G*, and to show that it is a subgroup of Perm(G), we 
must only establish the closure statement: for all g; and gz € G, Lg, © Gen) € G*. This 
follows from the obvious identities 
Lg, © (Lg) = Lg OL gt = Ly gt. 

These identities are verified in a straight forward manner. As an example we show that for 
all g EG, (Ea) = L,-1. This last equality means that L, o Lg-1 = L,-1 0 Lg = idg; it 
follows from 


Lg Lg = Digg) = Le = Ly1g = Ly 0 by = ide 
and L, = idg. 
We now have an obvious candidate for a homomorphism 6; from G onto G*: for g € G, 
6,(g) = L,. The map 6; is a homomorphism since for g; and gz € G, 
91 (9192) = Log = Lg, 0 Lg = 1(91) © Or (92). 


The map 6, is injective, since for g € G, L, is the identity map if and only if g = e; it is 
surjective by definition. O 


DEFINITION 4.50. The isomorphism 6; : G — G* is called the left regular representation 
of G. 


REMARK 4.51. It should be recognized that the map L,,g € G used in this section 
corresponds to (is the same as) the translation map map 7,,a € G, used in the previous 
section. 


In the next section we generalize this result. 


6. ISOMORPHISMS 111 


6.2. Solvable groups. In the study of the structure of groups, the following concept 
turns out to be extremely useful. 


DEFINITION 4.52. Let G be a group. We say that G is solvable if there exists a finite 
sequence of subgroups {H;} of G: 
(16) G >= Ap aah > Hs i SA, = fe} 


such that H; is normal in H;_; and the factor group H;_,/H; is abelian for i = 1, ..., r. 


All abelian groups are obviously solvable. So are the groups S(n) for n = 2,3 or 4 
(Exercise). We establish that for each n > 5, the permutation group S(n) is not solvable. 
We need some preliminaries. 


THEOREM 4.53. Let H be a normal subgroup of G. Then G/H is abelian if and only if 
aba-'b-! € H for alla andbe G. 


Proor. Let 6 : G — G/H be the canonical homomorphism. Assume that G/H is 

abelian. For all a and b€ G, 

O(aba~'b~") = 0(a)0(b)0(a~")0(b-") = ean. 
Thus aba~'b-! € H. Conversely, assume that aba~'b-! € H for all a and b € G. Let A and 
BeéeG/H. Since @ is surjective, there exists a and b € G such that A = O(a) and B = 6(b). 
Thus 

€a/n = 6(eg) = (aba 'b“') = ABA™'B™; 
from which it follows readily that BA = AB. 


PROPOSITION 4.54. Let H and N be subgroups of S(n) withn >5 and N normal in H. 
If H contains every 3-cycle and H/N is abelian, then N contains every 3-cycle. 


ProoF. We take two 3-cycles in S(n) with exactly one element in common, without 
loss of generality, 0 = (1,2,3) and r = (3,4,5). By hypothesis both of these belong to H 
and since H/N is abelian, by the previous theorem, (4,3,1) = ora~'r~! € N. We have 


completed the argument. O 


THEOREM 4.55. For each n € Zs5, S(n) is not solvable. 


PROOF. Using the notation of the definition of solvability, we conclude by induction that 
each H; contains all 3-cycles which contradicts the fact that H, is the trivial group. O 


6.3. MORE sections to be included. Consider what we should have in this section. 


EXERCISES 
(1) The right regular representation of the group G is defined by the map Or : G > G,, 
where 
Or(g) = Ry, 
R,(x) =2g"' forx eG 
and 


Gs = {Roig € GH. 
Show that G‘, is a subgroup of Perm(G) and that 0m is an isomorphism of G' onto 
Ge 


112 


4. GROUP HOMOMORPHISMS AND ISOMORPHISMS. 


(2) Relate 0; to Op. 
(3) What can you conclude about Op of if we were to define R, by R,(x) = xg,g € G? 
(4) In this exercise we study the group A(4). 
e What are the possible orders of the subgroups of A(4)? 
e Write an element 7 € A(4) as a disjoint product of cycles. Describe the products 
that can possibly appear. 
e Which of the following appear as isomorphic images of subgroups of A(4): 
(Z3,+), Zo, Zs, Zz xX Zo? 
(5) Prove that S(n) is solvable for n = 2, 3 and 4. 
(6) Is A(n), n > 5, solvable? Proof required. 


CHAPTER 5 


Algebraic structures 


The main structures studied in this chapter are commutative rings and their ideals es- 
pecially the ring integers and the ring of complex polynomials. We explore many of the 
similarities and some of the differences between these two structures. 


1. A collection of algebraic structures 
We start with a rather weak structure. 


DEFINITION 5.1. A semigroup (S,*) is a set S together with an associative binary oper- 
ation * on it. It has an zdentity if there is an element e € S such that ex s = s = se for 
alls € S. 

EXAMPLE 5.2. We continue with some simple observations. 


e Every group is a semigroup. 

e The integers with multiplication (Z,-) form a semigroup with identity element 1, 
but not a group. 

e Let X be any set. The set F(X) of all functions from X to itself is a semigroup 
under composition o with identity id. If |X| > 1, then (F(X), 0) is not a group. 


More interesting are the structures that are stronger than groups. 
DEFINITION 5.3. A ring (R,+,-) isa set R together with two binary operations addition 


+ and multiplication - (usually dropped entirely from expressions) such that (R,+) is an 
abelian group (thus the (additive) identity element of R is denoted by 0 and called zero) and 


e multiplication is associative; that is, for all 7, y and z € R, 
u(yz) = (xy)z, 


and 
e the the distributivity laws hold; that is, for all x, y and z € R, 


s(iyt+z)=azy+xz 
and 
(c+ y)z=xy+ yz. 


e has a multiplicative identity 1; that is, there exists an 1 4 0 in R such that la = 
gl=<2 forallzee R. ! 


The ring (R,+,-) is usually abbreviated by R. We need to specify various kinds of rings. 


DEFINITION 5.4. A non-empty subset S of (R,+,-) is a subring of R if it a ring with 
respect to the ring operations + and - of R. 


'CAUTION: this axiom is not always required in the definition of rings. 
113 


114 5. ALGEBRAIC STRUCTURES 
It is easy to establish the following 


PROPOSITION 5.5. A non-empty subset S' of (R,+,-) is a subring if and only if for alla 
andbe S$ 
(a) LES, 
(b) a—be, and 
(c) abe S. 


DEFINITION 5.6. A ring (R,+,-) 


e is commutative if its multiplication is; that is, if xy = yx for all x and y € R. 

e is an integral domain if it is a commutative ring without zero divisors. A zero 
divisor in the ring R is a non-zero a € R for which there exists a non-zero b € R 
such that ab = 0. 

e is a field if it is a commutative ring in which every non-zero element has a multi- 
plicative inverse; that is, for allO 4 x € R, there exists a y € R such that ry = 1. 

e Let a and b be elements of the commutative ring R. It has been our practice to 
denote the additive inverse of b by —b and a+ (—b) by a— b. Similarly if 6 has a 
multiplicative inverse b~', it is customary to denote ab~! also by ¢ 


REMARK 5.7. In the setting of Proposition 5.5 we shall say that R is an extension of S. 
This will be particularly usefull language in our discussions of subfields of C. 


EXAMPLE 5.8. A discussion of various important examples follows. 


(1) We studied in great detail the ring (Z,+,-). It is an integral domain, but not a 
field. Under our definitions, (2Z,-+,-) certainly satisfies all the properties to be an 
integral domain, but it is not even a subring of Z because it does not contain a 
(mutiplicative) identity. 

(2) The rationals Q, reals R, and complex numbers C are fields. Do the quaternions 

H form an integral domain or a field. Why? Each non-zero element of H has an 
inverse (see the exercise below). 

(3) The set M2(Z) of 2 x 2 matrices with integer entries is a ring under matrix addition 


and multiplication with identity I = ; 


oe)(81-(8 8] 


The ring is not commutative since, for example, 


21) [11] [20] g[2 0) (2 1) 22 
Oo eer Oi Gh 0 1 eames iad (09 eae 
(4) Similar properties hold for the rings M2(Q), M2(R) and M2(C). 
(5) For every integer, n > 2, (Z,,+,-) is a commutative ring with identity. It has zero 


divisors if n is composite and it is a field for for n prime (see next set of exercises). 
(6) Define 


| , but not an integral domain since 


Z(V2| = {a+ bV2;a,b€ Z} CR 


and endow it with the usual addition and multiplication it inherits as a subset of 
R. Then (Z/V/2], +,-) is an integral domain with an identity. 


1. A COLLECTION OF ALGEBRAIC STRUCTURES 115 

(7) So is 

Z\t] = {a+ bi3a,b € Z} CC. 

(8) The last two integral domains are not fields. We can enlarge them to get fields 
Q[V2] and Qi]: 

Z[V2] C Q[V2] = {a + bV2;a,b€ Q} CR 
and 
Z|] C Qhe] = {a+ bi3a,b€ QECC. 

(9) The last set of examples is part of the story of the first appearance of fields in the 
study of mathematics — see Chapter 9. Let n be a positive integer and consider a 
monic polynomial? P(x) of degree n 

eisai OAD ching paelsigg 
with rational coefficients a;. The fundamental theorem of algebra? tells us that P(«) 
has precisely n roots counting multiplicities; thus at most n distinct roots. There 
is smallest field F’ consisting of complex numbers and containing these roots. The 
study of the roots of P(x) is facilitated by the field F’. For degree one polynomials, 
F =Q. For degree two polynomials, there are already infinitely many candidates 
for F' (these fields can be divided however into finitely many classes). The examples 


Qiv2] and Q|:] belong to distinct classes and correspond to the polynomials x? — 2 
and x? + 1, respectively. 


Rings share many properties with the integers. An example is 
PROPOSITION 5.9. Let R be a ring. Then x0 = 0= 02 for alla € R. 
ProoF. Using the axioms for rings 
Oz + Ox = (0+ 0)a = Oz. 
Thus also 
(Ox + Ox) + (—Ox) = Ox + (—0x) = 0. 
But the left-hand side of the last equation is 
Ox + (Ox + (—Ox)) = Ox. 
The proof that «0 = 0 is similar and left to the reader. 


DEFINITION 5.10. A map @ from a ring R to aring S is a (ring) homomorphism if 
(1) 0(a + b) = O(a) + A(b) and 
(2) @(ab) = i )0(b) for all a and be R. 
(3) 001) = 
(4) As me 9 is an isomorphism if it both injective and surjective. An isomorphism 
of the ring R onto itself is an automorphism pf R. 


We define two more general structures. The first of these should be familiar to most 
readers. 


See the next section for more information on polynomials. 
3Its “easiest” proof uses complex analysis. 


116 5. ALGEBRAIC STRUCTURES 


DEFINITION 5.11. Let K be a field. A vector space V (over the field K) is an abelian 
group V (written additively) together with a scalar multiplication (of elements of V by 
elements of K); that is, an operation” that assigns to each scalar \ € K and each vector 
v € K, a vector (written as) Av € V such that 

(1) for allu EV, lu =v, 
(2) for all A and w€ K and allve V, (Au)u = A(mv), 
(3) for all A and we K and all v EV, (A+p)v = Av py, and® 

(4) for all \ € K and all u and v EV, A(u+v) = AU+ Av. 
If V is also a ring (thus with a second multiplication operation which maps ordered pairs of 
vectors into their product; V = (V,+,-)), then it is called a (K-)algebra provided the two 
multiplications (ring multiplication in V and scalar multiplication) are related by 


A(uv) = (Au)v = u(Av) for all \ € K and all u and v € V. 


EXAMPLE 5.12. Some examples that should be familiar to the reader as well as some 
examples that are not so familiar follow. 


(1) Most elementary linear algebra courses (books) are devoted to a study of the vector 
spaces R” over R consisting on n-tuples (Aj, ..., An) of real numbers and the vector 
space C” over C where the n-tuples (\j,..., An) consist of complex numbers. 

(2) Every vector space over C is automatically a vector space over R. For (Aj,...,An) € 
C”, we can use the decomposition 


rj =a,t+ b,2 


of each component into its real an imaginary part, to construct a canonical identi- 
fication 


C" Ss Otc An) (@i5..)Gn, b1p-40n) © R™. 


(3) Sets of polynomials form interesting vector spaces. They are studied in the next 
section. 

(4) Let J be any interval in R. The space Cg(J) of continuous real valued functions 
on J is an R-algebra with the usual definitions of addition and multiplication of 
functions. 

(5) The constructions and definitions of the last example also hold with R replaced by 
Q or C. We can also replace J by any topological space — a space where the concept 
of continuity makes sense. 

(6) We fix a prime p. The set M2(Z,) of 2 x 2 matrices with entries from the field Z, 
with the usual matrix operations form a Z,-algebra. 

(7) The set of 2 x 2 matrices of the form 


1 0 0 -1 
Lotito} 


4For most of the interesting examples K is Q, R or C. 

°Thus a map from K x V into V. 

Note that in the last and next equation, the + sign is used in a different sense on the two sides of the 
equal sign. The + sign on the left hand-side refers to scalar addition in the field K; while the + sign on the 
right hand-side refers to vector addition in V. Similarly the symbol 0 stands for both the additive identity 
in the field K and the identity in the group V. 


1. A COLLECTION OF ALGEBRAIC STRUCTURES 117 


with a and 6 € R with the usual matrix operations form another model for the 
complex numbers C; the matrices 


1 0 Q -1 
= 1 | ana y =| § val 


represent | (the identity) and 2, respectively. In particular, a complex number a + bz 


can also be considered as the real 2 x 2 matrix : = 


(8) Fix a positive integer n. The set of n x n matrices M,,(Z), M,(Q), Mn(IR) and 
M,(C), with respectively, integer, rational, real and complex entries are much stud- 
ied objects in many branches of theoretical and applicable mathematics. 


REMARK 5.13. In the rest of this chapter we will use elementary properties of vector 
spaces and linear maps between them. In particular, we will use the following properties: 
Let X = {X1, Xo,..., Xn} be a finite set of vectors in a vector space V over the field K. 


e We say that the set X is a spanning set for V if every vector v € V is a linear com- 
bination of vectors in X; that is, there exists constants A; such that v = $0", AXj. 
In this case we say that the vector space is finite dimensional. 

e The set X is linearly independent if a relation of the form 0 = )>y_, A,X; with the 
constants A; € K implies that each A; = 0. 

e For finite dimensional vector spaces, the dimension of the space can be defined 
as the minimum number of spanning vectors or the maximum number of linearly 
independent vectors in the space. 


SOME MATERIAL ON DETERMINANTS AND DIAGONALIZATION OF MATRI- 
CES NEEDED FOR FUTURE CHAPTERS. 


EXERCISES 


(1) Show that the semigroup (F(X), 0) has the following weak form of the cancellation 
property. Let f, gandh € F(X). If fog = foh and f is injective, then g = h. Show 
conversely that if for some fixed f € F(X), we have that for all g and h € F(X) 
fog=foh implies that g = h, then f is injective. 

Similarly show that f € F(X) is surjective if and only if for all g and h € F(X), 
go f =hog implies that g = h. 

(2) Show that every non-zero element in the quaternions H has a (multiplicative) in- 
verse. 

(3) We have seen that there is a map M that assigns to the complex number c = a+ bu, 

—b 


with a and b € R, the real 2 x 2 matrix . Call this set of matrices M. 


a 
b 
e Show that M is a field with respect to matrix addition and multiplication. 
e Show that for two complex numbers c; and Cc», 


M(c, + ce) = M(c1) + M(c2) and M(cyc2) = M(c1) M (co). 


e Show that Thus M is an isomorphism between C and M. 
(4) Prove that a finite integral domain must be field. 


118 5. ALGEBRAIC STRUCTURES 


Most of our examples on rings will involve two cases: the integers (Z,+,-) that were 
already studied in great detail and spaces of polynomials whose study is begun in the next 
section. 


2. The algebra of polynomials 
We begin with a 


DEFINITION 5.14. A (complex) polynomial p(x) is a formal expression in an indeterminate 
x and its powers of the form 
(17) p(z) =agp tayrt... tan”, 
where n € Zso and a; € C for 7 = 0,1,...,n. For each non-negative integer k we view x 
as the k* power of x and thus 2° = 1. We denote the set of polynomials by C[z]. If we 
restrict the domain of coefficients to be respectively the reals, rationals, integers, we get 
respectively the sets of real polynomials, rational polynomials and integer polynomials. With 
obvious notational conventions, we have the proper set inclusions 


C[z] D Riz] D Q|z] D Zz}. 


If a, # 0, then we say the polynomial p(x) has degree n and write deg p(x) = n, and we call 
ay, the leading coefficient of of p(x). The polynomial p(x) is monic if a, = 1. 


k 


REMARK 5.15. (1) The degree has not been defined for the identically zero polyno- 
mial (n = 0 = ao). It is covenient to define the degree of that polynomial, which 
will be denoted by 0, to be —oo and to regard —oo < d for all d € Zso. 

(2) The constants are a subset of the polynomials: C C C[z]. The non-zero constants 
Cyzo are precisely the polynomials of degree 0. 

(3) We will work mostly with complex polynomials. The reader should decide what 
changes, if any, are required for more restrictive classes of polynomials. Because Z 
is not a field, the Z[x] theory is significantly different from the C[z] theory. 

(4) The complex polynomial p(x) can be regarded both as a formal expresssion in its 
own right and as a continuous (it has many more properties) self-map of C. In this 
context, it is usually written as 


p: CHC, 


and p(x) denotes the value of the function p at the point x € C. Real (rational, 
integral) polynomials define self maps of R (Q, Z). We can, of course, use results 
from calculus when considering element of R[x]. We will use below at least one such 
result. 


It is convenient to write a 43 
a= y ax = y Gr" 
i=0 i=0 


where in the last sum it is understood that a; = 0 for all but finitely many indeces 7. With 
this convention we introduce several binary operations: 


addition of polynomials (on C[z] x C{z]) 


(oe) 


- ax’ + 3 ba? = So (ai + bj)", 
i=0 i=0 


i=0 


2. THE ALGEBRA OF POLYNOMIALS 119 


multiplication of polynomials by scalars (on C x C[z]) 


rv bs va) = > a,x" 
i=0 i=0 


and multiplication of polynomials (on C[z] x C[z]) 


i=0 i=0 i=0 \j=0 
Multiplication of polynomials by scalars is a special case of the binary operation of multi- 
plication of polynomials. A tedious checking of the many axioms shows that Ca] with the 
above algebraic binary operations is a C-algebra. Its dimension as a vector space over C 
is oo. The reader should check that the above binary operations (especially multiplication 
of polynomials) are the ones familiar from high school algebra. As remarked earlier, the 
C-algebra C|z] is a subalgebra of Cc(C) the space of continuous complex valued functions 
of a complex variable. As such its study is also a part of analysis. The reader should be 
convinced that the formal operations of addition and multiplication of polynomials, do agree 
with the corresponding concepts when the polynomials are viewed as functions. 
The function degree is a map 


deg : C[z] — Zs U {—oo}. 
The most important, for our applications, properties of this map are summarized in 


THEOREM 5.16. Let a(x) and b(x) € Cia]. Then 
(a) deg (a(x) + b(x)) < max{deg (a(x)), deg (b(x))}, and 
(b) deg (a(x)b(x)) = deg (a(x)) + deg (b(z)). 
PrRooF. The proof is completely straight forward and hence left to the reader. We 


remark that in the arithmetic for (Zso9 U {—oo}, +.-) that we are using —oo + a = —oo for 
all a € Zyo U {—oo}. 


REMARK 5.17. In our study of the integers, the absolute value was a useful tool in 
determining the size of an integer (in existence arguments, for example). We shall see that 
we can use the degree to assign a size to a polynomial in many arguments. 


For each non-negative integer n, we let C,,[xz] denote the set of polynomials of degree 
<n. It is clear that C,,|z] is a vector subspace of C[z] of dimension n+ 1 and that for all n 
and m € Zso, the multiplication map 


(18) M :C,[z] x Cy[2] — Crim|z] 


which assigns to each ordered pair (p(x), ¢(x)) € C,[z] x C,,[x] its product p(x)q(x) € 
Cr4m[2] is a surjection. 
A useful alternate form of writing (17) is 


(19) p(z) = A (w@—a)) ... (© — an), 
with A and the collection of a; € C. It is quite easy to go from (19) to (17): 


Oi (SUPA S- Ci Cats oy OPS Opa Li asses 


41 <i2...4n—j 


120 5. ALGEBRAIC STRUCTURES 


It is particularly easy to conclude that 
ag = (-1)"AQ1Q2...An, An—1 = —A(Q, + Ag +..-An), Gn = A; 


equations that will be used many times in the sequel. 
The journey from (17) to (19) is not so quick and requires some very non-trivial mathe- 
matics, the Fundamental theorem of algebra discussed in Chapter 7. 


EXERCISES 


(1) Show that the the map M of (18) is surjective. Let p(x) € Ch4ml[x]. Describe 
M~*(p(2)). 

HINT: You may (and probably should) use the Fundamental theorem of algebra. 

(2) Let a(x) and b(a”) € Cia]. What happens if you try to divide the polynomial b(z) 

by the polynomial a(x)? Is there a different answer if you assume that a(x) and 
b(x) € Zax]? 


2.1. The vector space of polynomials of degree n. We are assuming that the 
reader has some familiarity with the concepts of linear algebra and use it to study in detail 
the vector space C,,[z], here n is an arbitraty non-negative integer. As we already observed 
this vector space (over C) has dimension n+ 1. A convenient basis for C,,[x] consists of the 
n+ 1 vectors 


Te ees a 


The polynomial (17) can be represented by the column vector ((n + 1) x 1 matrix) with 
entries dg, Qi, .-., Gn, and a linear operator 


T eC le) Cala 


can be represented with respect to such bases by the an (m+ 1) x (n+ 1) matrix Mr whose 
j*" row is the vector (a1,Q2, ..., @m41) provided that the the operator T sends the vector 
aI! € C,,[a] to the vector 37"*" aya! € C,n[2]. Thus the vector of coefficients of the image 
of the vector of coefficients v € C"t! of the vector p(x) € C,[z] is the vector Mpv € C™*". 
Review linear operator, kernel or null space of an operator and image of an operator as well 


as the 


THEOREM 5.18. Let L be a linear operator from a vector space V to a vector space W. 
Then the dimension of V equals the dimension of the kernel of L plus the dimension of its 
image (the vector space L(V) CW). 


2.2. The Euclidean algorithm (for polynomials). We start with 


DEFINITION 5.19. Let a(a) and b(x) € Cia]. We say that a(x) divides b(x) and write 
a(x)|b(a) if there exists a g(x) € C[x] such that b(@) = a(ax)q(x). We write in this case 
g(a) = 3. 

THEOREM 5.20 (The division algorithm). Let a(x) and b(x) be polynomials and assume 
that a(x) ts not the zero polynomial (thus deg a(x) > —oo). There exist unique polynomials 
q(x) and r(x) such that 
(a) B(x) = a(x)q(x) + r(x), and 
(b) deg r(x) < deg a(x). 


2. THE ALGEBRA OF POLYNOMIALS 121 


PROOF. The reader should note the similarities of the statement and proof to those of 
Theorem 1.13 of Chapter 1. In each case we are dividing one quantity by second quantity 
to obtain a quotient and a remainder. The remainder should be “smaller” than the second 
quantity. The measurement of smallness in the case of integers was obvious; for polynomials, 
it is measured by the degree. Just like in the case of integers, the proof has two parts. 


Existence: If a(x) divides b(x), set q(x) = oh and (a) =O, 

Assume now that a(a) does not divide b(x). This forces the degree of a(x) to be positive, 
Let 

D = {deg (b(@) — a(a)k(@)); k(x) € Cla]f. 

We claim that D C Zso. For if there existsts a k(x) € Clz] such that b(@) — a(x)k(x) = 0, 
then a(x) would divide b(x). Let d be the greatest lower bound for D. Then there exists 
a polynomial g(a) € C[{z] such that r(a) = b(x) — a(x)q(x) has degree d. If d = 0, then 
ceratainly d = deg r(x) < deg a(x). We claim that also if d > 0, then d < deg a(x). So 
assume that d > 0. If d > deg a(x), we write 


r(z) = aor? +ayz**+...+a4, ao #0 
and 
a(x) = Boe" + Bit" *+...+ Bn, Bo #0, d>n. 
Since 


8(2) — a(a) fala) + Seat") = r(x) — afc) | Bat] = ort +. bes 


has non-negative degree < (d— 1), we have reached a contradiction to the fact that d is the 
smallest element of D. 


Uniqueness: Write b(x) = a(x)q 


(2) + r(x) = a Jar(a) + ri(x), where q(x), u(x), r(x) 
and rj(a) € Cia], and also deg r(x) 
) 


< deg a(x) > deg ri(x). Then 
la(@) — a (@)] = ri(@) — r(@). 


a(x x 


If g(x) A q(x), then 
deg (a(x)|a(x) — m(@)]) 2 deg (a(x) 


deg (ri(a) — r(x)) < max{deg (ri(x)), deg (r(@))} < deg (a(x); 
which is impossible. Thus q(x) = q(x) and hence also r(x) = r1(2). 


while 


THEOREM 5.21. Let a(x) and b(x) be polynomials and assume that not both of these 
are the zero polynomial. There exists a unique monic polynomial d(x) = (a(x), b(x)) = 
gcd(a(x), b(a)) of degree > 0 such that 
(a) d(x)|a(x) and d(x)|b(x), and 
(b) whenever c(a) € C[x] divides both a(x) and b(x), it also divides d(x). 

PRooF. We are again modeling our proof on the corresponding theorem for the integers. 
Let 


D = {a(x)s(x) + b(x)t(x); s(x) and t(x) € Cla] and a(x)s(x) + b(x)t(x) £ O}. 
The set D is not empty (it contains either a(x) or b(x)). Let 
D={deéZ; d= deg (p(x)) for some p(x) € D}. 


122 5. ALGEBRAIC STRUCTURES 


The set D is not empty because D is not and does not contain —oo because 0 ¢ D. Hence 
D is a non-empty set of integers that is bounded from below (by 0). It hence contains a 
smallest (non-negative) element 6. We can thus find a monic polynomial d(x) = a(x)s,(x) + 
b(x)t.(a) € D whose degree is 6 (d(x) cannot be the zero polynomial). 

The proof of (b) is rather simple. We first note that c(x) 4 0 as otherwise both a(x) = 
0 = b(x). Obviously c(x) divides d(x) = a(x)s,(x) + b(x)t,(z). 

We proceed to the proof of (a). By the division algorithm a(x) = d(x)q(x) + r(x) where 
r(a) and q(x) € Cla] with deg (r(x)) < 6 = deg (d(x)). Thus 


r(x) = a(a) — d(x)q(2) = a(x) — q(a)la(x)so(x) + 0(2)to(x)| 


= a(x)[1 — q(x) 80(x)] + b(@)[—-a(a)to(a)], 
and if r(x) were not the zero polynomial, it would certainly belongs to D. Since the degree 
of r(x) is smaller than 6 (which is the minimum of the degrees of the polynomials in D), 
r(x) = 0. This shows of course that d(x) divides a(x). The argument that d(x)|b(x) is 
similar. We have established existence. 

For uniqueness assume that the monic polynomial d, also satisfies conditions (a) and (b). 
We use (a) for d;(x) to conclude that it divides both a(x) and b(x). Now we use (b) for d(z) 
with c(a) = d,(x) to conclude that d;(x)|d(x). Similarly d(x)|d\(x). Since both d(x) and 
d,(x) are monic polynomials, we conclude that d(a) = d(x). 


DEFINITION 5.22. As with integers, we call d(x), the greatest common divisor (gcd) of 
a(x) and b(x) and say that a(x) and (x) are relatively prime if d(x) = 1. As with integers, 
we define (0,0) = 0. 


COROLLARY 5.23 (of proof). For all a(x) and b(a) € C[a], (a(x), b(x)) is a linear com- 
bination of a(x) and b(x); that is, there exist s(a) and t(x) € Cla] such that 


(a(x), b(@)) = a(a)s(x) + b(x)t(@). 


REMARK 5.24. The above corollary does not tell us how to compute (a(x), b(a)) as a 
linear combination of a(x) and b(x). THE GCD ALGORITHM of section 3 of Chapter 1 
works for complex polynomials 

Instead of describing the general algorithm, we illustrate (using notation that is a straight 
forward translation of the algorithn for integers) with an example of two poynomials a(x) = 
x® — 327 + 2x and b(x) = 2x? — 6x. To give us confidence in our calculation, we use the 
fundamental theorem of algebra to factor the polynomials: 


a(z) =a2(x%—1)(a—2), d(x) = 2x(a — 3). 


The factored forms of the two polynomials tell us immediately that (a(x), b(z)) = a. The 
algorithm (without taking advantage of the factorization which is not needed) now reads 


ie 
0 1 


As is the case with the earlier version of this algorithm, the arrows need not alternate 
between the rows of matrices; whereas it was convenient to use this alternating convention 
when dealing with integers, it may not be when dealing with polynomials — the aim in this 
case is to continue reducing the degrees of the polynomials appearing un the last columns of 


xL 
Qu? — 6a 0. if x? — 3a |° 


xe? — 3x7 + Qe : i) $x 
2 


2. THE ALGEBRA OF POLYNOMIALS 123 


the matrices. The meaning of the last arrow should be obvious to the reader. We conclude 
from the above calculations that 


o> (ale)b(e)) = (8 — 3x7 + 22) — enn? — 62); 


as can easily be checked. As we have pointed out, we need not the use the fundamental 
theorem of algebra to factor the polynomials nor do we need to know what the gcd is in 
order to compute it. We compute the gcd of a second set of two polynomials: 

a(x) =2? = 32? +22, b(2) = 2° — 52° + Tz? — 332. 


It is completely obvious that x|(a(x), b(a)); but of course other monic polynomials of degree 
1 may also divide (a(x), b(x)). The algorithm reads 


LO) | othe ia? = Ser 1 -—a2 | -2¢7° +52? — 32 

G.I 2? — 347 + Qe | NE A x? — 327 + Qe 

lt p90 7 eg ed ie —£+2 —77+¢ 
ene lHls() 1 x? — 3a? + Qe eg —2?+2r+1 | —20? +22 


It is now easily concluded that 
a(x — 1) = (a(x), b(x)) = (—1)(a* — 5a? + 7x? — 3x) + (x — 2)(x® — 32? + 22); 


LEMMA 5.25. Let a(x) and b(x) € C[z], and assume that b(a) = a(x)q(x)+r(x) for some 
q(x) and r(x) € C[z]. Then (a(x), b(x)) = (a(x), r(x)). 

ProoF. Let d(x) = (a(x), b(x)). If d(x) = 0, then a(x) = 0 = b(x) and there is nothing 
to prove. So assume that d(x) # 0. In this case, d(x)|r(x) me thus d(x)|(a(x),r(x)). But 
He (a(x),r(z))|6(e) and (a(2),r(z))\a(z); hence (a(z),r(2))|d(e) and thus (a(x), r(e)) = 


THEOREM 5.26 (The Euclidean algorithm). Let a(x) and b(x) € Cla] with a(x) # 
0. Then there exists a unique n € N, unique ri(x),7ro(x),...,7n(x) € Clx] and unique 


qi(%), do(), «5 In(®), Ingi(x) € Cla] such that 
biz) = al(x)qi(x) +ri(x), 0 < deg (r1(x)) < deg (a(x)) 
a(x) = 11(x)go(z) + ro(x), 0 < deg (r2(x)) < deg (ri(z)) 
(x) 3( (73( (r2( 


m(x) = is 2 )q3(x) + 173(x), 0 < deg (rz 


SOS MAG Lee Osawa dee OO) 
Peale) = Pn(L)Gn41(2) 


and, (Oe) .bk)) = F(a): 


PROOF. The existence and uniqueness of n, and the existence and properties of the 
collections of r;(a) and q;(x) follow from the division algorithm. The last lemma tells us that 


(b(x), a(a)) = (a(@), r1(@)) = (11 (@), P2(@)) = + = Pn-2(), Pr (2) = (Pra (2), Pn(@)) = Tal). 


124 5. ALGEBRAIC STRUCTURES 


2.3. Differentiation. 


DEFINITION 5.27. We define formally the derivative, p'(x) of the polynomial p(x) of (17) 
as the polynomial 
p(x) =a, + 2agr+.. = ae 
For each no ee integer k, the k"” derivative p(x) of the polynomial p(x) is defined 
inductively by p(x) = p(x) and p*+) (x) is the derivative of p“)(x). Note that the n‘” 
derivative of an n“” degree polynomial is a non-zero constant, while for each m > n, its m*” 
derivative is zero. 


THEOREM 5.28 (Taylor series). Let p(x) be an n‘” degree polynonial, then for all z, and 
AeCc 


(2) (n) 
p(Z + A) = p(2o) + p'(z)A | P eo) q2 ule D 0) a 


PrRooF. The proof is a long calculation using the binomial theorem that is left to the 
reader. Note that we claim here that the proof is formal calculation that does not require 
any analysis. O 


As an illustration of the power of formal calculations and because it will be needed in 
the section on multiple roots, we establish the following 


THEOREM 5.29. Let p(x), q(x) and r(x) be three polynomials with r(x) = p(x)q(x). Then 
r(x) = p'(x)q(2) + pl2)q'(@). 


PROOF. In the arguments that follow, we have ignored the indeces of summation, and 
coefficients indexed a negative we should be taken as zero. Let 


ee q(x ge and r(x =e : 


Then , 
= ae Ge) = Sa Phe) = >: ten 
and 
G= os G50y >. 
J 
Write 


p'(x)q(a) + p(x)q' (a) = So dia. 
We compute d;. From the last equation, it follows that 
(20) dy = 7 (9G + Vajarbi_g + (6 — F + Vaydi_j41). 
J 
We need to establish that 
d; = (i + Lcis4 = (i + 1) S ajbiss—j3 


which follows upon rewriting (20) as 


d; = SoU + Lage 4 + SRC = 9) Qj 41055 = Soli + iD cee rae = (i + 1) S- CO ah ays 
j 


j J j 


3. IDEALS 125 


EXERCISES 


(1) Let n be a positive integer, and let p(x), pi(x), ... Pn(x) be a collection of n + 1 
polynomials with 


p(x) = [] pio. 


Show that 


n n 


pa) =>. |e), 


j=l i=1 


“cay = § pil) for 4 j 

qg(@) = { pe) tor 7 => * 

(2) Let n be a positive integer. We study the differential operator 
D:Cy4i[2] - C,[z] 


defined by sending the polynomial p(x) to the polynomial p’(x), and the integral 
operator 


where 


I: C, [2] — Cyil[z] 
defined by sending the polynomial a,2” + ad,_;2" 1 +... + ap to the polynomial 


an ntl ao" ae aon. 
e Show that D and IJ are linear operators. 
e Show that D is surjective and that J is injective. 


e Show that 


IoD:C,|z] - C,[z] 
has a one dimensional kernel. What is the image of this operator? 
e Show that 
Dol:C,|z] - C,[z] 


is the identity operator. 
e Does there exist a linear operator 


T:C,[z] > Crsil[z] 
such that To D is the identity? 
3. Ideals 


3.1. Ideals in commutative rings. Let (R,+,-) be a commutative ring and] C Ra 


subring. We would like to give the additive cosets 


R/I={a+lI;a€ R} 


a quotient ring structure. Let a and b € R. From our work on quotient groups, we know 
that R/J is an abelian group; addition is, of course, defined by 


(a+) +(b+1) =(a+b)4+1. 


We try to define multiplication anologously by 


(a+I)(b+J) = (ab) +1. 


126 5. ALGEBRAIC STRUCTURES 


We must verify that this is a well defined operation. Toward this end, let a’ and b’ € R be 
such that a — a’ and b — b! € I. We need to verify that a’b' — ab € I. We try to do so by 
observing that 


ab! — ab =a'b' — ab + a/b — ab = a'(b' — b) + (a’ — ab. 
We know that (b’ — b) and (a’ — a) € I. But since we only know that a’ and b € R (not 1), 
we are unable to conclude that a’(b’ — b) and (a’ — a)b € I (which would conclude the proof 


that multiplication is well defined). Try as one might, there is no way out of this difficulty 
without some addtional assumption on J. 


DEFINITION 5.30. A non-empty subset J of a commutative ring R is an ideal provided: 
(a) for alla and b € J, (a—b) € J, and 
(b) for alla € J andallre R, ar el. 


REMARK 5.31. e Every commutative ring R that contains 2 or more elements has 
two ideals: the trivial ideal {0} and the unit ideal R. An ideal J C R is called 
proper. 


e Ideals need not be subrings since they need not contain 1. 

e If an ideal J in R contains an invertible element (then it must also contain 1) of R, 
then J = R. 

e If J is a proper ideal in the commutative ring R, then R/J is a commutative ring 
known as quotient ring . 

e For every commutative ring R, R/{0} = {R} and R/R = {0} = {0+ R}. The latter 
is, of course, not a ring in our definition since it does not contain 1. 

e Ideal may also be defined for non-commutative rings. In this case one needs to 
distinguish three classes of ideals: left, right and two sided. 

e If J, and Jy are ideals in the commutative ring R, then so is 


K+l={a€ R; a=a)+4+ a with a; € i}. 

DEFINITION 5.32. Let R be a commutative ring and a € R, the principal ideal generated 

by a is defined by 
ora ha =Are oe Rt 
More generally, assume that for some positive integer n, a1,...,dn € R. The ideal generated 
by aj, ...,@n is 
< Ay, An P= R+...+a,R= {riay t+... 4+ 7nGn; 7 € R for i =1,...,n}. 

The proof that a,R +... + a,R is an ideal is elementary. Let rja, +... + rnan, and 

riayt...+7 an €aR+...+a,R. Then 
(ryay +... + TnQn) — (ria +... +7 an) = (rr — rar t.. + (tn -— 7) )an EQ R+...+a,R 
and if r € R, then 
r(ryay +... + TnQn) = (rri)ay +... + (7Tn)an € AR+...+ an. 

PROPOSITION 5.33. Let 9: RS be a ring homomorphism. Then 
ker(@) = {r € R; 6(r) = 0} is an ideal in R, 
Im(6@) is a subring of S, 
ker(@) = {0} if and only if 6 is injective, and 
) @ is injective whenever R is a field. 


3. IDEALS 127 


EXAMPLE 5.34. We discuss three important examples. 


e (Z,+,-) is a subring, hence a subgroup, of (Z[z],+-), with respect to the additive 
structures. But the two groups are not isomorphic. In this case, we can determine all 
group homomorphisms 6 : (Z, +) — (Z[z],+). The group Z is cyclic with generator 
1. Hence 6(Z) is cyclic with generator @(1). But Z[] is certainly not a cyclic group. 

e Z is a subring of Z[V/2], but not an ideal. 

e Let p(x) be a monic polynomial in C{x] of degree n > 0. Let J be the principal 
ideal (p(x)). Then the quotient ring C[z|/J is isomorphic as a vector space to 
C,,_1|z]. Each equivalence class Q(a) + I € C[z]/I has a canonical representative 
as a polynomial q(x) of degree < (n — 1). To verify the last claim assume that the 
degree of O(a) =ax™ +... is m>n, then Q(x) — ax™"p(z) is equivalent to Q(z) 
and has degree and most m — 1. If m—1< n we are done. Otherwise, we iterate 
the procedure and eventually find a plynomial q(x) of degree at most n — 1 that is 
equivalent to Q(x). It is clear that two distinct polynomials of degree < (n — 1) 
cannot be equivalent modulo the ideal J (their difference cannot belong to [). This 
example introduces a multiplication structure on polynomials of degree < (n — 1) 
that is very different from the multiplicative stucture on C[z] 

e We continue with the last example with p(x) = x? — 1. Here the multiplication (in 
terms of canonical representatives for equivalence classes) yields 


(agp ta,x+a2x7)(bp+b:2-+bo27) = (agbo+a1b2a2b1) (a1bo aob1 agb2)x (dgbo ayo, agb2)x”. 
3.2. Ideals in Z and C[z]. 
PROPOSITION 5.35. Every ideal I in Z is principal. 


ProoF. If J is the trivial ideal (< 0 >) or the unit ideal (< 1 >= Z), it is certainly 
principal. Assume now that J is not the trivial ideal. It thus contains an integer a # 0. 
If a < 0, then J also contains (—1)a > 0. Thus the set of integers S = {b € I;b > 0} is 
non-empty and bounded from below and hence contains a smallest element d. We claim that 
I =< d>. Because d € J, it follows the < d >= dZ C I. Conversely, if c € I, then by the 
division algorithm c = qd+r for some gq andr € Zwith0<r<d. Hencer € I. If r £0, 
then it would also belong to S which would contradict that d was the smallest element of S. 
We conclude that c €< d > and hence I C< d>. 


PROPOSITION 5.36. Every ideal I in Cla] is principal. 


ProoF. If I is the trivial ideal (< 0 >) or the unit ideal (< 1 >= C[z]), it is certainly 
principal. Assume now that J is not the trivial ideal nor the unit ideal. Let D be the set 
of degrees of the non-zero elements of J. It is a non-empty subset of N. It thus contains 
a smallest integer d #4 0. If d = 0, then J would contain a non-zero constant and would 
hence be a the unit ideal. Thus there is a polynomial d(x) of degree d in I. We claim that 
I =< d(x) >. Because d(x) € J, it follows that < d(x) >= d(x)C[z] C I. Conversely, if 
p(x) € I, then by the division algorithm p(x) = q(x)d(x)+r(a) for some q(x) and r(x) € C[x 
with r(x) = 0 or 0 < deg(r(x)) < d. Nowr(x) € I. If r(x) £0, then its degree would belong 
to D and would be smaller than d. This contradiction shows that r(x) = 0 and hence that 
p(x) €< d(x) >. Thus I C< d(x) >. 


DEFINITION 5.37. A proper ideal J in a commutative ring R is a prime ideal if for all a 
and b € R, ab € I implies that either a or b € J, and is a maximal ideal if whenever N is 
another proper ideal with J C N C R, then J = N. 


128 5. ALGEBRAIC STRUCTURES 


REMARK 5.38. The following assertions are easily established. 


e An ideal J in a commutative ring R is maximal if and only if R/T is a field. 
e An ideal J in R is prime if and only if R/J is an integral domain. 


ProorF. As an illustration, we provide a proof of this (and part of the next) 
assertion. Say J is a prime ideal. Let a and b€ Rif (a+ J)(b+/) =T, then ab EI 
and hence either a or b must be in J and thus either a+ J or b+ J is the zero element 
of R/I. Conversely, if ab € I, then either a+ J or b+ I is the zero element of R/I 
and hence either a or b € J. Oj 


e Every maximal ideal is prime. The converse is false. 


PROOF. We verify only the first claim. Let J be a maximal ideal in the ring 
R. Then R/T is a field, hence certainly an integral domain. Hence J is a prime 
ideal. 


THEOREM 5.39. (a) An ideal < d> in the integers Z is prime if and only if d or —d is. 
(b) An ideal < d(x) >C C[a] is prime if and only if d(x) = ax+f, witha € Cy and GB € C. 
(c) An ideal I C Z or C C[a] is prime if and only if it is maximal . 


PROOF. (a) Without loss of generality, d is a positive integer. Assume that <d> isa 
prime ideal. If d were not a prime integer, we could certainly find integers a and b € Z such 
that d divides the product ab, but d does not divide either a or b. But then ab € I =< d> 
implies that either a or b € I. Thus d divides either a or b. This contradiction establishes 
that dis a prime. We leave the proof of the converse as an exercise. 

(b) We use the fundamental theorem of algebra (to be established in Chapter 7) to conclude 
that the principal ideal < d(x) > is prime if and only if d() is a polynomial of degree one. 
(c) This is an imediate consequence of the fact that all ideals in either Z or C[z] are principal. 


EXERCISES 


Let d be a prime integer. Prove that < d> is a prime ideal in Z. 

Prove that an ideal in Z or in C[z] is prime if and only if it is maximal. 

Exhibit some of the differences between Z|z] and C{z]. 

Prove that for every positive integer n, the quotient ring Z/nZ is isomorphic to the 
ring Z,. 

(5) Compute the gcd of the polynomials 7+ + 1 and x? + 1. 


NN Na Na 


(1 
(2 
(3 
(4 


(6) Let n bea positive integer. Define the gcd d(x) of n polynomials p(x), po(x), ..., Pn(x) € 


Cx] and show the that the ideal generated by these polynomials is the principal 
ideal generated by the gcd. 

(7) Describe all homomorphisms 6 from the integers (Z,+) to an arbitrary group. In 
particular, is 0(Z) cyclic and what are the possible orders of such groups? 


4. CRT revisited 
THEOREM 5.40 (The Chinese remainder theorem). Let m1, mz, ..., m, ber > 0 relatively 
prime positive integers. The map 
0 Livia wit Gin, Dos. & Lay 
defined by 
9 ([a@]mime...mp) = ([@hmrs[@}mes +» [am,) 


5. POLYNOMIALS OVER MORE GENERAL FIELDS 129 
is a ring isomorphism. 


PROOF. We need to show that @ is well defined and a ring isomorphism. The map is 
well defined: if for a and b € Z, we have that [@]mimo...m, = [Blmimo...m,, then for each 
i, [@]m; = [D]m,;. It is clear that 6 preserves additive and multiplicative structures of the 
respective rings and is thus a ring homomorphism. The map @ is injective: if for a and b € Z 
and for each i we have that [a]m, = [b]m,, then (a — b)|m; and hence (a — b)|mimz...m,. The 
surjectivity of the map @ is a set theoretic consequence of two facts: (1) the map is injective 
BTV Zsa sates | Layee Das eS agen 


REMARK 5.41. We discuss some connections to previous results (that such connections 
must exist is implied by the name of the theorem, for example). 

e Unlike the earlier version of the Chinese remainder theorem (Theorem 1.70), the 
above proof produces the existence of the solution, but does not provide an algorithm 
for finding it. 

e Recall Theorem 4.37. 


EXERCISES 


(1) Supply the details to show that the map @ of the last theorem preserves both the 
additive and multiplicative structures. 
(2) Construct an inverse for the map 0. 


5. Polynomials over more general fields 


Throughout this section K is field that contains Q and is contained in C. All of our work 
on C{[z]}, in the previous sections, that does not involve the fundamental theorem of algebra 
applies to K|a]. In this section we explore some of the differences, especially concepts needed 
in Chapter 9. As usual, we start with a 


DEFINITION 5.42. Let p(x) € Ka] have positive degree. We say that the polynomial 
p(x) is irreducible over K if given a factorization p(x) = f(x)g(x) with f(x) and g(x) € K[z], 
then either f(x) or g(a) € K (that is, at least one of them must have degree 0). 


REMARK 5.43. The concept of irreducibility depends on the field K. 


e Over C, the only polynomials of positive degree that are irreducible are those of 
degree one (by the fundamental theorem of algebra). 
e The polynomial of degree two x? + 1 is irreducible over R. 


THEOREM 5.44. Every polynomial in K |x] of positive degree can be expressed as a product 
of a constant X € K and irreducible monic polynomials p,(x), ..., pr(a) € K[a]. In such a 
product, the constant and polynomials are uniquely determined up to rearrangement. 


DEFINITION 5.45. A field K is algebraically closed if every polynomial in K [x] of positive 
degree has a root in K. 


CoROLLARY 5.46. If K is algebraically closed, then every p(x) € K |x] of positive degree 
n has a factorization (19). The constant \ € K is unique; so are the roots a;, up to 
rearrangement. 


REMARK 5.47. e The field of complex numbers C is algebraically closed as a result 
of the fundamental theorem of algebra (discussed in Chapter 7), but Q and R are 
not. 


130 5. ALGEBRAIC STRUCTURES 


e Finite dimensional vector spaces over the field with two elements Z» will be impor- 
tant in our study of error correcting codes in Chapter 6. 

e For an arbitrary commutative ring R, we can form its polynomial ring R[z]. It is 
easy to see that R[x] is an integral domain whenever R is. 


6. Fields of quotients and rings of rational functions 


In Sub-section 5.1 of Chapter 1 we discussed the construction of the rationals from the 
integers. That construction is quite general. 


DEFINITION 5.48. Let R be an integral domain. We introduce an equivalence relation 
=on S=Rx Ry by declaring (a,b) to be equivalent to (a, 3) provided a3 = ba; we write 
; as a representative for the equivalence class of (a,b) € R x Ryo. We let Q be the set 
of equivalence classes so obained and call it the field of quotients of R. Addition + and 


multiplication - in Q are defined as in the construction of Q from Z. 


PROPOSITION 5.49. (a) (Q,+,-) is a field. 
(b) The map @ that sends a € R to $ € Q is an injective ring homomorphism. We identify 
R with 0(R) and can hence view it as a subring of the field Q. 
(c) Ifa andb € R with b invertible, then ¢ = ab“. 


For an arbitrary commutatine ring R, we have defined a polynomial ring R[x]. We can 
also define the ring of rational functions R(x) as the set of formal expressions p(x) = oe 
with p(az) and q(x) € Rix] and q(x) # 0. We can define the binary operations + and - on 
R(«) that extend the corresponding operations on R[iz|and turn R(x) into a commutative 
ring. We can thus view R[x] as a subring of R(x) and also view p as a function from Ra,r)z40 
to R. 

In the most interesting cases R is an integral domain and we can form the field of quotients 
Q, of the integral domain R{z] as well as the field of quotients Q» of the integral domain 
R(x). We can also construct the field of quotients Q of R and hence also Q[x] and Q(z). 
One checks at this point that the various constructions are related. In particular, the field 


of quotients of Q1, Q2 and Q(z) are more or less the same object. 


CHAPTER 6 


Error correcting codes 


We apply the material we have developed to study error detecting and error correcting 
codes. 


1. ISBN 


A code is just a number. We have discussed methods for transmitting codes that can 
not be deciphered by un-authorized listeners. We now turn to a different issue. How can the 
receiver be sure that the information received is identical with that sent? And if there is a 
transmission error, how can it be corrected? To what degree of certainty? We start with a 
discussion of 


EXAMPLE 6.1. The International Standard Book Number (ISBN) is a sequence of nine 
integers a1d9...agdg, where 0 < a; < 9, for each 7, together with a check digit a (thus 
@1@2...dgaga), where a is either an integer between 0 and 9 (inclusive) or the symbol X which 
stands for the integer 10. The inclusion of this 10-th digit gives a check on the other 9. It is 
constructed as the representative of the congruence class of 

9 
— (10a; + 9a2 +... + 3ag + 2a9) = — $0 (11 - ta; mod 11 


i=1 


chosen, as usual (for us), between the integers 0 and 10 (inclusive). If a; is erroneously 
transmitted as 9 instead of its correct value a, and all the other digits, including the check 
digit, are transmitted correctly, then the receiver computes as check digit y = a — (11 — 
i)(6—a) mod 11 and since 11 is prime and 


11-140 mod 11 and(6—a)#0 mod 11, 


we see that also (11 —7)(G —a) #0 mod 11. Thus y 4 a and the receiver knows that 
there is an error in the transmission. What the receiver does NOT know is which digit is 
wrong. It could, of course, be the check digit. Similarly, if the sender interchanges a; with 
a; and say that 1 < i,j < 9, then the receiver computes —(11 — i)a; — (11 — j)a; instead 
of —(11 — i)a; — (11 — j)a; as the contribution of these two terms to the check digit. The 
difference 
—(11 — 4)(a; — a;) — (11 — j) (a; — a3) = (@ — 9) (ay — a4) 
is congruent to 0 mod 11) if and only a; = a;, and again a single error is detected. 


2. Groups and codes 


Codes and information, in general, are usually transmitted in binary rather than decimal 
mode (notation). Thus a message is a finite set of zeros and ones; for example, 00011 or 
10100. 


131 


132 6. ERROR CORRECTING CODES 


DEFINITION 6.2. We fix a positive integer n. A word of length n is a point in Z>. We 
write such a word as a = a44@9...dy, where each a; is either 0 or 1. 


REMARK 6.3. The simplest finite field with two elements (Z2,+,-) has lots of structure. 
We will denote from now on by B. In particular, we fix a positive integer n and study B” 
the abelian group (under addition) of order 2”; it is also a vector space over B of dimension 
n. (We will not use this additional structure nor that it is a Boolean algebra'.) The addition 
table for the group B” is rather simple: if the element b € B” is the n-tuple 6,bo...b,, then 
a+ 6 is the n-tuple c,c9...cn, where, c; = 0 if and only if a; = b;. Thus each element of this 
group is its own inverse. The (scalar) product of the scalar \ € B with the vector w € B” 
is also quite simple: 


To formalize the concept of check-digits, we consider a word w of positive length m and 
transmit instead a code word f(w) of length n which should have redundant information 
to enable us to detect and perhaps correct transmission errors. We thus are considering a 
coding function 

f:B” = B". 
Whenever necessary we have at our disposal a list of all possible code words. 

In order for the coding function to enable us to recover the original word w from the 
code word f(w), it is necessary that f be an injective mapping; in particular, that n > m. 
In practice we take n > m. WE ASSUME FROM NOW ON THAT f IS INJECTIVE AND 
THAT n > m. In practice there is a necessary trade off; the bigger n is, the more redundancy 
we have, the easier it should be to catch errors and correct mistakes, however, it is more 
expensive to transmit longer messages. 


EXAMPLE 6.4. We begin by considering prototypes for the two simplest examples of 
coding functions. 

e We take n = m+ 1 and define f(w) = wa, where for w = ajdp...dm, © = Doi, Gi, 
where the sum is evaluated in B. Thus the check digit x is 0 if the number of 
non-zero digits in the word w is even and 1 otherwise. If exactly one error is made 
in transmission, it will certainly be detected. But we cannot correct it, since we do 
not know where it is — it might be the check digit (so the message we receive, wx 
with x stripped from it) is correct although we cannot be sure of it”. In general this 
coding function will detect an odd number of transmission errors — but not an even 
number. How bad is this? Assume the probability p of a transmission error in any 
single digit is 1 in a 1000 ( .001 or 107%). (In practice it is much smaller.) If our 
word w is of length five (m = 5), then the probability of precisely 2 transmission 
errors is ( : ) (.001)?(.999)4 = .0000149..., much smaller than the probability of 
precisely one error 6(.001)(.999)° = .00597.... 

e Let us now take n = 3m and define f(w) = www. When we receive a code word 
(of length 3m) we break it up into 3 words of length m: abc. If a = b = c we 
can be fairly certain but not sure that w = a. Say that one mistake was made 


1We have not defined this structure. 
Even if the check digit shows no obvious errors, there may be some. 


2. GROUPS AND CODES 133 


in the transmission of a and then the same mistake was made in the transmission 
of b and c. What is the probability of this happening. Using the same set of 
values for m and p as in the previous example, we evaluate this probability as 
5 ((.001)(.999)4)* = .0000 0000 49...; quite an unlikely event. 

We will use the next two examples in much of our subsequent development. We will 
hence refer to them in the sequel as standard small examples 1 and 2, respectively. 
We use m = 4 in the first of these and add a check digit before transmitting the word. 
The transition from word to code word in the second example will be explained later. 
For these examples, we consider the maps f : B+ — B® and g : B® — B® defined 
by the following two tables. 


e | f(x) 
0000 | 00000 
0001 | 00011 
0010 | 00101 
0011 | 00110 Ge) 
0100 | 01001 000 | 000000 
0101 | 01010 001 | 001111 
0110 | 01100 010 | 010101 
0111 | 01111 | and || 011 | 011010 
1000 | 10001 100 | 100111 
1001 | 10010 101 | 101000 
1010 | 10100 110 | 110010 
1011 | 10111 111) 111101 
1100 | 11000 
1101 | 11011 
1110 | 11101 
1111 | 11110 


The list of elements x € B* and B® appearing in the first columns of the above 
tables are shown in lexicographic (dictionary) ordering. The reader should check 
that the table for f represents the first of our examples with m = 4. We will see 
below that these two examples are special cases of a family of codes. 


We introduce some preliminaries in order to discuss more efficient codes than the next 
to last example. 


DEFINITION 6.5. We define the weight of a word w = a,d2...dm € B™ as 


wt(w) = - lis 
i=1 


where the sum is in Z; thus the weight of a word is the number of ones in its binary expansion, 


0< wt(w) <m 


with equality 0 = wt(w) if and only if w = 0. We thus have a map 


wt: B” — Z. 


134 6. ERROR CORRECTING CODES 


The distance d(v,w) between words v and w € B” is 
d(v, w) = wt(v — w) = wt(v + w); 


thus the distance between these words v = b,bo...b,, and w is precisely the number of places 
where they differ (that is, 


d(v, w) = |{é = 1,2,...,m5 a; 4 b3}| = S~ ai — Bil, 
i=1 


where the last sum is again in Z). As in the case of weights, d(w, v) = 0 if and only if w = v. 


REMARK 6.6. The distance function provides us with a map 
d:B”" xB" — Z>0 

that satisfies the usual properties of distance functions studied in analysis: For all u, v and 
w €é B”, 

e d(w,v) = 0 if and only if w =v. 

e d(w,v) =d(v,w). 

e d(u,w) < d(u,v) + d(v,w). 
Its values land in Zyo, a subset of Rso, where the “normal” distance functions of analysis 
take their values. The distance function d is translation invariant: that is, 


d(u—v,w—v)=d(u,w). 


THEOREM 6.7. Let k be a positive integer. A coding function f : B™ — B” detects k 
or fewer errors if and only if the minimum distance between distinct code words is at least 
kK+1. 


PROOF. Say we have received a message w € B”, whereas v € B” was the intended 
(correct) message. The code word v is in our list of possible code words. We need to 
know, of course, that such a list exists. If there are k or fewer errors in our message, then 
d(v,w) < k. So unless v = w, w is not in our list of code words and we have detected an 
error if and only if the minimum distance between code words is > (k + 1). 


THEOREM 6.8. Let k be a positive integer. A coding function f : B™ — B” allows the 
correction of k or fewer errors if and only if the minimum distance between distinct code 
words is at least 2k + 1. 


PROOF. If the distance between distinct code words is at least 2k + 1, then by the 
previous theorem, we can detect up to and including 2k transmission errors. But even if we 
had as few as k + 1 transmission errors, there may be two distinct code words in B” that 
are are within distance k + 1 to the erroneous message we received, so we cannot be sure 
how to correct the error. However, there is at most one code word within distance k of the 
erroneous message. So if the transmission had at most k errors, there is precisely one code 
word within distance k of the message. 


EXAMPLE 6.9. For our standard small example 1, an examination of the differences 
between code words shows that the minimum distance between distinct code words is 2. 
(This involves computing 15+ 14+ ...+1 = 120 differences and then checking their weights.) 
Hence this coding function can detect one error, but cannot correct it. For small example 2, 
the minimum distance between distinct code words is also 2. 


2. GROUPS AND CODES 135 


DEFINITION 6.10. Let f : B™ — B” be a coding function. We say that f is a group or 
linear code if f(B™) is a subgroup of B”. 


REMARK 6.11. If f : B™ — B” is a group homomorphism, then it certainly is a linear 
code. 


THEOREM 6.12. If f : B” — B” is a linear code, the minimum distance between distinct 
code words is minimum of the weights of non-zero code words. 


PRooF. Let d and d’ be the minima of the distances between distinct code words and of 

the weights of non-zero code words, respectively; that is, 
d= min {d(v,w);v and w € f(B”),v Zw} 
and 
d’ = min {wt(v);v € f(B™),v 4 0}. 
Both d and d’ exist (and belong to Zo) since they are minima of non-empty (finite) sets of 
positive integers. Since f is a linear code, 0 € f(B™), and thus we conclude from the fact 
that for all v € f(B™), wt(v) = d(v, 0), that 
d<d. 


Also there exist code words u and v € f(B™) (hence also u+v € f(B”™)) such that u 4 v 
and 
d=d(u,v) = wt(u-—v) = wt(utv)>d. 


EXAMPLE 6.13. Standard small examples 1 and 2 are linear codes. The first set of 
code words is the group generated by the words 00011, 00101, 01001 and 10001; the second 
by 001111, 010101 and 100111. For linear codes, the last theorem certainly simplifies the 
calculations of the minimal distances between code words. For our standard small example 
1, we need to examine only 15 words (instead of 120 differences between words). 


We describe a useful method for producing group codes f as group homomorphisms. 


DEFINITION 6.14. Let m and n be positive integers with m <n. An m xn matrix (thus 
with m rows and n columns) G with entries in B is a generator matrix if its first m columns 
form the m x m identity matrix I =I,,,. Thus G = [I,,, A] where A is am x (n — m) matrix 
of zeros and ones. 


EXAMPLE 6.15. Examples of generator matrices with m = 1,2,3,4 andn =m-+1 are 


10001 
Leis. 
Ta A 30 01001 
[ 1 0}. | le 0 1 0 0] and 


Another generator matrix is 


SS 
_ 
S 
_ 
SS 
— 


3This extra fact is not needed. 


136 6. ERROR CORRECTING CODES 


DEFINITION 6.16. View the elements of B™ as 1 x m matrices (row vectors). An m x n 
generator matrix G defines, using matrix multiplication, the group code (homomorphism*) 


fe: BB" >wreuG EB". 
We will say that G is the generator matrix for the code fg. 


EXAMPLE 6.17. The last two matrices are generator matrices for our two standard small 
examples. 


REMARK 6.18. A basis (over B) of the vector space B™ consists of the m-row vectors v;, 
i = 1,2,...,m, consisting of a one in the 7-th column and m—1 zeros (in the other columns, of 
course). The image fg(v;) of v; under the map fg is the i-th row of the matrix G. The vector 
subspace f¢(B™) of B” is hence spanned by the n rows of the matrix G. We also observe 
that for each w € B”, the first m letters of the code word fg¢(w) consist of the word w; 
thus fg(w) = wu, where v € B"-™ constitutes a set of check digits and fg is injective. The 
property of group homomorphisms (or equivalently of linear maps between vector spaces) 
that we find most useful is 


fa(u+w) = fa(v) + fe(w) for all v and w € B”. 


THEOREM 6.19. Let 0: B™ — B” be an arbitrary homomorphism. Then there exists an 
(m x n) generator matrix G such that 0 = fa. 


ProoF. (This theorem is really a standard result from linear algebra.) We may of course 
view 6 as a B-linear map from the vector space (over B) B™ to the vector space B”. The 
vectors vu; form a basis for B”. The i-th row of the matrix G is then 0(u;) € B”. 


EXAMPLE 6.20. The last three matrices in Example 6.15 are the generator matrices for 
the codes f and g described in the third set of codes in Example 6.4. 


We describe a useful way to proceed with an error detection and correction procedure. 
Assume that we are using a group code f : B™ — B”, not necessarily given as a group 
homomorphism. We let W C B” be the set of code words; a subgroup of B”. Let d be the 
minimum of the distances between code words in W. Suppose we receive a word v € B” 
with a single error in it. It differs from a word w € W, by a basis element v; € B”. So 
instead of receiving a word w € W, we have received the word v; + w in the coset vu; + W. 
Similarly, if we receive a word with precisely two errors, then a code word in the group W 
has been transformed by mistake into a word in the coset v;+v; +W with 2 4 7. In general 
a word with precisely k transmission errors involves replacing W by v;, + uj, +... +i, +W, 
where the integers 7; may be assumed to satisfy 1 < i) <ig <... < tm <n. 

Assume that we receive a word v. If it is a code word, we can be fairly certain that 
it is error free. It could of course contain transmission errors, necessarily more than one if 
we have check digits, that transforms one code word to another. If the minimum distance” 
between code words is (d =) 7, then we would need at least 7 errors to change one code 
word to another — a very unlikely possibility. Say v is not a code word. Thus it contains at 
least one error. We want to correct the error, without requesting that the word be resent. 


4The map fa is also B-linear (a stronger property); that is, a linear map from the vector space B™ to 
the vector space B”. 
>We will be making parenthetical remarks about such an example as we go along. 


2. GROUPS AND CODES 137 


It turns out that we cannot be absolutely sure that we are correcting the erroneous word to 
the correct code word. However, in practice, the probability that we have corrected an error 
is very close to 1. We use what is known as the maximum likelihood decoding procedure. We 
replace v by a code word closest to it. To do so, we compute the set of distances d(v, w) 
for all w € W and choose as our replacement word w, as one that minimizes these set of 
distances; if there is more than one such wz, choose one arbitrarily. (In our example, we is 
unique if v has fewer than 4 transmission errors.) 

Instead of constructing a w, for each v we receive, we can calculate in advance a decoding 
table as follows. 

We start with a coding function f : B™ — B” for which the code words form a subgroup 
W of B”. 


e The decoding table T is a 2"~™” x 2” matrix (thus consisting of 2” = o(B") entries; 
each of these entries is a word in B”). 

e We list in a single row the 2™ elements in the group W, starting with the identity® 
0 = 000...000 of W. This is the first row of the matrix TJ. It is convenient to label 
0= vy. 

e Find a word x2 in B” — W of minimal weight (there will, in general, be more than 
one of these; choose one). Add x2 to (equivalently, subtract x2 from) the entries in 
the first row of T to obtain its second row. Observe that above each word in the 
second row is the word in the first row that is closest to it. The entries in the second 
row of T hence list the elements in the coset r2+W. Of course, WN (r2+W) = 0. 

e Find a word x3 in B" — W — (a2 + W) of minimal weight. The third row in the 
matrix T' is constructed as the coset 73 + W. We now observe that, in addition 
to WN (a3 + W) = 0, we also have that (v2 + W)N (a3 + W) = Q; for if uv € 
(t2+W)N (#3 +W), then v = x2 + We = 23 + wz for some w2 and ws € W and it 
follows that 73 = 2 + (w2 — w3) € L2 + W; contrary to our choice of 73. 

e We keep repeating the above procedure. 

e After r steps, we have used 2""r elements of B”, arranged into r rows, the r-th row 
consisting of the coset x, + W. We observe that 


(2; +W) O(a, +W) =9, for j =1,...,r—1. 


We have listed all the elements of B” after 2”~” steps and at this point we have 
completed the construction of the decoding table 7’. 

e We call the words x;, 7 = 1,...,2"-, the coset leaders of the decoding table; they 
are the entries in the first column of the matrix T’. 

e We receive a word v. It certainly is an element of B”; hence in our decoding table 
T. Say the word v is the (i, 7) entry of the matrix T. The code word (in W) nearest 
to v that we use is then the (1,7) entry of T. 


We illustrate the above construction for the group code fg : B* — B’ defined by the 
generator matrix 


EPROP, 


Tt is convenient but not necessary to do so. 


138 6. ERROR CORRECTING CODES 


The lengthy calculations are best performed on a computer. A sample (not simple) MAPLE 
program is shown below. We follow the program with some explanatory notes. 


MAPLE SESSION #10 


> with(linalg): dirprod:=proc(A::{list,set},B::{list,set}) 
local AxB,Ai, Bi, i,j; 
AxB:=(]; 
for i from 1 to nops(A) do 
Ai:=op(A[i]); 
for j from 1 to nops(B) do 
Bi:=op(B[j]); 
AxB := [op(AxB), [ Ai, Bi ]]; 
od; 
od; 
end: 
selfprod:=proc(S::{list},n::posint) 
local A,m; 
if (n=1) then 
return(S) ; 
fi; 
if (modp(n,2)=0) then 
return( selfprod(dirprod(S,S) ,n/2)); 
else 
return( dirprod(S,selfprod(S,n-1))); 
fi; 
end: 

> listA := selfprod((0,1],4): 


> listB := selfprod([0,1],7): 


> Gis 
matrix([[1,0,0,0,0,1,1],[0,1,0,0,1,1,0],[0,0,1,0,1,1,1],[0,0,0,1,1,1,1 
1]); 
1000011 
Gea 010011 +0 
a UG Ae AL Ge tees Ae. a 
00011141 


> g := x -> multiply(x,G): list1 := map(g,listA): mod2 := x -> 
modp(x,2): listmod2 := 1 -> map(mod2,1): gpW := map(listmod2,list1): 
> SetMinus:=(A,B)->remove(x->inlist(x,B),A): 
> -inlist?=proc(x,L) 
local i; 
for i from 1 to nops(L) do 
if (equal(L[i],x)) then return(true) fi; 


od; 
false; 
end: 
> list2 := SetMinus(listB,gpW): nops(listB); nops(list2) ; 
128 
112 


> wt := x -> sum(x[k],k=1..7): listwtgpW := map(wt,gpW) ; 


0000000 0001111 0010111 0011000 0100110 0101001 0110001 
0000001 0001110 0010110 0011001 0100111 0101000 0110000 
0000010 0001101 0010101 0011010 0100100 0101011 0110011 
0000100 0001011 0010011 0011100 0100010 0101101 0110101 
0001000 0000111 0011111 0010000 0101110 0100001 0111001 
0100000 0101111 0110111 0111000 0000110 0001001 0010001 
1000000 1001111 1010111 1011000 1100110 1101001 1110001 
0000101 0001010 0010010 0011101 0100011 0101100 0110100 


> 


2. GROUPS AND CODES 139 


listwtgpW := (0, 4, 4, 2, 3, 3, 3, 5, 3, 3, 3,5, 4, 4, 4, 6] 


lowestwt:=proc(L) local result, i; result := L[1]; for i from 2 to 
nops(L) do if (wt(L[i]) < wt(result)) then result := L[i] fi; od; 
eval(result); end proc: 


x2 := lowestwt(list2): 


addvect:=proc(x,L) local fcn, preres; fcn := y -> matadd(x,y); preres 
:= map(fcn,L); map(listmod2,preres) end proc: 


coset2 := addvect(x2,gpwW): 

list3 := SetMinus(list2,coset2): 
x3 := lowestwt(list3): 

coset3 := addvect(x3,gpW): 

list4 := SetMinus(list3,coset3): 
x4 := lowestwt(list4): 

coset4 := addvect(x4,gpW): 

list5 := SetMinus(list4,coset4): 
x5 := lowestwt(list5): 

coset5 := addvect(x5,gpW): 

list6 := SetMinus(list5,coset5): 
x6 := lowestwt(list6) : 

coset6 := addvect(x6,gpW): 

list7 := SetMinus(list6,coset6): 


x7 := lowestwt(list7): coset7 := addvect(x7,gpW): 


list8 := SetMinus(list7,coset7): x8 := lowestwt(list8): coset8 := 
addvect (x8, gpW) : 


nops (SetMinus(list8,coset8)) ; 
0 


decodingmatrix := array(1..8,1..16,[gpW, coset2, coset3,coset4, 
coset5, coset6, coset7, coset8]); 


11110 91000011 001100 1010100 1011011 
1000010 1001101 1010101 1011010 
1000001 1001110 1010110 1011001 
1000111 1001000 1010000 1011111 
1001011 000100 1011100 1010011 
1100011 101100 1110100 1111011 
0000011 0001100 0010100 0011011 
1000110 1001001 010001 1011110 


F 
a 
ee 
ao 
Or 

orf 


jan 
oO 
pa 
F 

fo) 


0111 


an 
an 
an 
oo 


a 
i 
Oo 
oO 


BR 


1110 


H 
sematrix( (lO, tds Ld Ol wht 1) 9 9 00)» (02409: 1005413: 


140 


> 


6. ERROR CORRECTING CODES 


011 
Be isk £0 
de ak af 
; Sa (ee ee ae 
Tha) 
0 1 0 
QO: U/L 


result .:=-array(1..8,;1..2)% fork to: 8 do-for 1 to.2 do if G.=.2) 


then result[k,1] := eval(decodingmatrix[k,1]) else result[k,1] := 


listmod2(eval (multiply (decodingmatrix[k,1],H))) end if end do end do: 


print (result) ; 


000 0000000 
001 0000001 
010 0000010 
100 0000100 
111 0001000 
110 0100000 
011 1000000 
| 101 0000101 


***END MAPLE PROGRAM*** 


e The first item in the program consists of two parts; the first computes the Cartesian 


product of two sets or lists and the second uses this procedure to compute the n-th 
Cartesian product of a list. 


e The second and third commands compute B* and B’, respectively. 
e The next step enters the matrix G. 
e The next set of instructions execute the mod 2 multiplication of matrices to obtain 


the group W. We have suppressed the printing of W as well as the cosets of W in 
B’ in the subsequent commands because they appear as the rows of the decoding 
matrix. 


e The next two sets of commands compute the relative complement of one list with 


respect to another. The standard MAPLE set difference command is inappropriate 
for our purposes. 


e The computations of the cosets of W in B’ need several preliminaries. One needs to 


compute relative differences of sets (the listn entries), minimal weights of sets of 
words, and finally the cosets. This is done in the subsequent set of commands. As 
a check on our work, we have computed the cardinality of listB and list2 using 
the nops command. 


e We have displayed the weights of the code words W. Since the lowest non-zero 


weight of elements of W is 2, the coding function will detect 1 error, but will correct 
none. 


e The eight xi are the coset leaders. 
e The lists list8 and coset8 contain the same words in a different order; this fact is 


verified by the nops (SetMinus (list8,coset8)) command. 


e We have shown only 12 of the 16 columns of the decoding matrix. 


2. GROUPS AND CODES 141 


e The role of the last two commands should be obvious after we develop a little more 
theory. 

We now proceed to a discussion of correcting errors with less information than a complete 
decoding table. 


DEFINITION 6.21. Let G = [L,,, A] be an m x n generator matrix. The corresponding 
A 


1 ae 


parity-check matrix is the n x (n — m) matrix H = i The syndrome of a word 


w € B” is the word wH € B”™™. 


PROPOSITION 6.22. Let H be a parity-check matrix corresponding to the generator matrix 
G. Then w € B” is a code word if and only if wH = 0. 


Proor. A word w € B” is a code word if and only if w = sG = s{I,,, A] for some s € B™ 
if and only if w = uv with u € B”™ and v = uA. We rewrite the last equation as 


0 = uA —vIp_m = UAt+ UIn_m = (uv) = wH 


showing that w is a code word if and only if 0 = wH. 


COROLLARY 6.23. Two words are in the same row of the coset decoding table if and only 
if they have the same syndrome. 


PROOF. Two words u and v € B” are in the same row of the decoding table (in the same 
coset of the code group W) if and only if they differ by a code word w € W;; that is, if and 
only if w = u — v if and only if 0 = wH = uH — vH or if and only if uwH = vH. 


REMARK 6.24. A parity-check matrix H defines a linear mapping 
H:B">u~vH €B"™. 


This linear mapping need not be injective nor surjective. If H is the parity-check matrix 
corresponding to the generator matrix G, then we also have the injective (not surjective) 
linear mapping (we called it f¢ before) 


G:B”™ swr wG eB". 
The last proposition shows that H o G is the zero map. 


We can now expand the decoding table for a group code by adding an extra column, say 
at the left, that records the syndrome of each row. Thus the first two entries of the first 
row of the expanded decoding table T, which is now a 2"~-™ x (2™ +4 1) matrix, start with 
0 € B”” and 0 € B”. We can dispense with all but the first two columns of this expanded 
decoding table and label the resulting 2"~™ x 2 matrix T. If we receive a word v, we first 
compute its syndrome vH, which we find in the first of our two column matrix T. In the 
next column, in the same row as vH, is the coset leader u of the row in the full decoding 
table T’ where v is found. Adding u to v we obtain u + v, a code word closest to the word 
v that we received. Stripping away the last n — m digits from the word u + v we obtain the 
maximum likelihood candidate for the word w that we believe was intended. 


EXERCISES 


(1) If f :B™” — B” is a linear code, must it also be a group homomorphism? 


142 6. ERROR CORRECTING CODES 


(2) Show that wt : B” — Z is not a group homomorphism but that 

redg owt : B” — Zo 
is. 

(3) In this section words were considered as row vectors of length m and hence the 
coding function f for a group code was a map w +> wG for some some generator 
matrix G. How would you define a generator matrix and the corresponding parity- 
check matrix if words were viewed as column vectors of length m? 


CHAPTER 7 


Roots of polynomials 


Among the main purposes of this chapter is to discuss unified approaches (formulae) for 
solving polynomial equation of degree < 4. We outline several approaches to establishing 
a key result: the fundamental theorem of algebra. Along the way we continue the study 
of the ring of polynomials; emphasising once again that it shares many properties with the 
ring of integers. For this chapter, we assume that the reader is familiar with some basic 
linear algebra; for example, the contents of [2]. THE MATERIAL IN THIS CHAPTER IS 
IN PRELIMINARY FORM. 


1. Roots of polynomials 


The main results of this subsection is the next theorem, the fundamental theorem of 
algebra. We present several proofs. Each of them requires some analysis. 


THEOREM 7.1 (The fundamental theorem of algebra, FTA). For all n € Zso, an n‘” 
degree complex polynomial p(x) has precisely n complex roots counting multiplicities; thus 
there exist constants 0 # X and (31,...,8n € C such that 

p(x) = A(x — G,)...(@ — Bn). 

We start with 


DEFINITION 7.2. A zero or root of a polynomial p(x) of positive degree is a complex 
number a such that p(a) = 0. 


It is an immediate consequence of the Euclidean algorithm that if a@ is a root of the 
polynomial p(x) of degree n > 0, then there exists a unique polynomial q(x) of degree n — 1 
such that 


p(x) = (« — a)q(x). 

1.0.1. A linear algebra approach. We outline a recent argument due to H. Derksen [4], 
based mostly on linear algebra. While this approach requires considerable algebraic tools 
(which our outlined without proof), it depends on very little analysis (complete details 
provided). 

LEMMA 7.3. Every real polynomial p(x) of odd degree has a real zero. 

PROOF. This standard fact is proved in most calculus courses. The argument goes as 
follows. It involves no loss of generality to assume that p(x) is monic. Thus 

lim p(z) = oo and lim p(x) = —oo. 
it follows that there exists an R > 0 such that 
p(R) > 0 and p(—R) < 0. 


By the intermediate value theorem there exists a \ in the open interval (—R, R) such that 
p(A) = 9. 


143 


144 7. ROOTS OF POLYNOMIALS 


LEMMA 7.4. Every complex number z = a+ 321, a and 3 ER, has a square root. 


PrRooF. Put y = \/a?+4+ (3? = |z|. The existence of square roots of non-negative real 
numbers is a basic property of the real number system; a fact from calculus. Then 


2 
(= =a+t fr. 


We begin an outline the the algebraic tolls needed in this approach. 


DEFINITION 7.5. Let K be a field and V a K-vector space. A K-linear self map L of V 
is called an endomorphism of V. A scalar A € K is an eigenvalue of L if there exists a vector 
x €V,« #0, called an eigenvector of X or of L, such that L(x) = Ax. 


We introduce a statement P(K,d,r) for a field K and positive integers d and r: Any 
r commuting endomorphisms of a K-vector space V of dimension n such that d does not 
divide n have a common eigenvector. 


LEMMA 7.6. If P(K,d,1) holds, then so does P(K,d,r) for all positive integers r. 


LEMMA 7.7. P(R,2,1r) holds for all positive integers r; that is, any collection A,, Ag, ..., 
A, of commuting endomorphisms of an odd dimensional real vector space have a common 
eigenvector. 


LEMMA 7.8. P(C, 2,1) holds; that is, every endomorphism of an odd dimensional complex 
vector has an eigenvector. 


LEMMA 7.9. P(C,2*,r) holds for all positive integers k and r. 


The above series of technical results lead to a theorem that is of interest in its own right. 


THEOREM 7.10. Let r be a positive integer. If A,, Ao, ..., Ap are commuting endo- 
morphisms of non-trivial finite dimensional C-vector space V, then they have a common 
ergenvector. 


As a consequence (corollary in some sense) of the last theorem, we can now establish the 
fundamental theorem of algebra in the following form: 


COROLLARY 7.11 (The fundamental theorem of algebra). If p(x) is a non-constant poly- 
nomial with complex coefficients, then there exists a 3 € C such that p(Z) = 0. 

PRooF. It suffices to assume that p(x) is a monic polynomial of degree n > 1: 
(21) pa) S2" age be ap: 
We claim that p(w) = det(zI — A), where A is the companion matrix of p(x): 


O° 0" sag, Oey 

1 0 0 —An-1 

0 1 0 —aAn—2 
A= 


1. ROOTS OF POLYNOMIALS 145 


We use induction on n to verify that 


x 0 O 0 0 An 

—-l1 2x 0O 0 0 An—1 

0 -l «2 0 0 An—2 
iP age ae eg = sdet 

0 0 O.. -l 2 ao 

0 0 O 0 -l x+a, 


The formula for the base case n = 1 reads 
£-+a, = det [x + ay], 


which is obviously true. We assume now that n > 1. Expanding in terms of minors, we see 
that 


x 0 O 0 0 An 

—-l1 az 0O 0 0 An—1 

0 -l 2 0 0 An—2 

det 

0 0 O.. -1l 2 ao 

0 0 O 0 -1l x+a, 
x O 0 0 An-1 0 O 0 0 An 
—l 2 0 0 An—2 —l1 2x 0 0 An—2 

= x det : : + det 

0 O.. -l 2 ao 0 O.. -l 2 ao 
0 O.. O -l w+a, 0 O.. O -l w+a, 


The induction hypothesis tells us that 


xz 0 0 OO} an-1 
—1 x 0 0 An—2 

det : ; Sh aig ee 
0 0 —-l « ag 


=) 
j=) 
j=) 
| 
— 
8 
+ 
=) 
an 


146 7. ROOTS OF POLYNOMIALS 


while another expansion in terms of minors and the fact that the determinant of an upper 
triangular matrix is the product of the diagonal elements yields 


0 0 0 0 Gn 


—l1 2 0 0 An—2 nae y : 

det = (=1)"" 7a, det =a: 
0 O .. -l 2 ao : : ry as 
0 O.. O -1l wr+a, 


The last two equalities finish the induction argument. The last theorem tells that A has an 
eigenvalue; that is, there exists a @ € C such that p(@) = 0. 


REMARK 7.12. The above corollary implies Theorem 7.1 by induction on n € Zy9. The 
base case n = 1 holds. So if n > 1 and p(x) is a polynomial of degree n, then by our last 
corollary, there is a 3,, € C such that p(3,,) = 0. By the division algorithm, 


p(x) = (2 — Bn)g(x) +r(2), 
where q(x) is a complex polynomial of degree n—1 and r(x) € C. Since p(G,,) = 0, r(x) = 0. 
The induction hypothesis tell us that q(x) factors as required. 


1.0.2. A topological (real analysis) approach. 


LEMMA 7.13 (d’Alembert). Jf p(x) is a non-constant polynomial and p(zo) 4 0, then any 
ball about zo contains a point z, with |p(z1)| < |p(Zo)|- 


PROOF. We use the Taylor series (with zo replacing x,) for the polynomial p(x). Since 
the polynomial is not constant, there exists a smallest integer k, 1 < k < n such that 
p*) #0. Thus (as a polynomial in A) 


p(zo + A) = p(zo) + aA* + cA‘, 


where a is a non-zero complex number and « is a polynomial in A of degree n — k — 1 (if 
n =k, then n — k — 1 should be interpreted as —co). Now let us think of A as a complex 
number of small (certainly much less than 1) absolute value — which can be made even 
smaller as our argument proceeds. By choosing A sufficiently small we can certainly make 
lear] < 4|p(zo)|. By making |Delta| even smaller, if necessary, we can also make sure 
that |aA*| < }|p(Zo)|. It readily follows that 0 < |p(zo + A)| < 2|p(zo)|. We would like to 
eliminate the 2 from the last equation. We have one more degree of freedom at our disposal: 
CHANGING THE ARGUMENT OF THE COMPLEX NUMBER A. So let 00, 6 and 6; be 
the initial arguments of p(zo), A and aA* + «A**, respectively; chosen to lie in the interval 
(0,27). As we crank up the argument of A from 6 to @ + 27, the argument of aA* + «A*t! 
changes continuously from 0; to 6;+27n for some positive integer n. (This is not obvious — an 
analysis proof is needed.) By the intermediate value theorem there is a y € |0,27n]| so that 
if we choose A to have argument y, the complex number aA* + €A*t! will have argument 
—6). This means that the vectors p(zo) and aA* + cA**! point in opposite directions and 
hence (TO BE CONTINUED); 


We are once again ready to prove 


THEOREM 7.14 (FTA). Every non-constant polynomial has a root. 


2. CIRCULANT MATRICES 147 


Proor. NEED ARGUMENT HERE. 


REMARK 7.15. Another proof of FTA using some analytic steps, similar to those used 
above, can be found in [9]. 


1.0.3. A complex analysis approach. By far the most elegant proof of FTA is through 
complex analysis. See for example [6]. 


EXERCISES 


(1) Let F, and Fy be subfields of C that contain Q. Show that FF» is also a subfield of 
C that contains Q. Conclude that if G, 2, ..., G,, is an arbitrary finite collection of 
complex numbers, then there exists a unique smallest (by inclusion) subfield F’ Cc C 
that contains each ¢;, 7 = 1, 2,...,n. 

(2) Let p(a) be a monic polynomial of degree n > 1. Let ¢;, i = 1,2,...,2 be the roots 


of p(z). 
(a) Show that there exists a unique field F C C that contains Q and ¢; for i = 
1,2,...,7. 


(b) Does F' contain the coefficients of the polynomial p(x)? 
(3) Let n be a positive integer and let p(x) of (21) be a monic real polynomial of degree 
n. Let 


a = |a;|+...+|a,| +1. 


Show that p(a) > 0 and p(—a) < 0. Hence there exists a real number \ € (—a, a) 
such that p(A) = 0. 


1.1. Derivatives and multiple roots. SECTION TO BE COMPLETED LATER. 


2. Circulant matrices 
Fix a positive integer n > 2, and let 
O = (Woy Uijeesg VAST) 
be a row vector in C”. Define a shift operator T : C” — C” by 
BOs. 0152325 Dp) Wats “Ups 3 OR9).2 


The circulant matrix associated to v is the n x n matrix whose rows are given by iterations 
of the shift operator acting on v, that is to say, the matrix whose k-th row is given by T*~!v, 
k =1,...,n. Such a matrix will be denoted by 


(22) VS ‘ere{ vp Scie Up, ices Vai 
THEOREM 7.16. Let v = (vo, U1,..-,;Un—1) be a vector in C”, and V = circ{v}. Ife is a 

primitive n-th root of unity, then 

Vo Ui ott) Un-2 Un-1 

Un-1 Vo ***  Un—-3 Un—-2 n-1 /n-1 

Ss ; il 
(23) deiVedets|) & 92%. a Ss |e Ti] (Se +) 

(op) U3 °°" V0 V1 l=0 \j=0 


U1 V2 °°) Un-1 V0 


148 7. ROOTS OF POLYNOMIALS 


PROOF. We view the matrix V = circ{vo, v1,...,;Un—1} as a self map (linear operator) 
of C”. For each integer 1,0 <1 <n -—1, let x, € C” be the transpose of the row vector 
allees eg eee tains 
(24) Say ey ewe EO Os. 

A calculation shows that 
Ug U4 Un—2 Un—1 1 1 
Un—-1 VO Un—-3 Un—2 é! 
, ; al = 22 
v2 U3 V0 U1 : : 
U1 Vg eens Un—1 VO e(r-Ll enV 
Thus ; is an eigenvalue of V with normalized eigenvector x;. Since the n vectors #9, 11,...,%n—1 


are linearly independent, they form a basis for C”. We conclude, by a standard result from 
linear algebra, that the matrix V is diagonalizable and that 


n—-l1 
det V = I] Ap 
l=0 


Let Dy be the diagonal matrix with diagonal entries Ag, A1, ..-, An—2, An—1, respectively. 
Then there exists? an n x n invertible matrix C such that 
(25) CVG SDy-. 
Thus the matrices 
Vi cire{ 0p; Vises: Unea pad Dy = diag AnsAtusostken) 


are conjugate’. 


REMARK 7.17. It is possible to always use € = en. The y(n) distinct primitive n-th 
roots of unity are then {e*; ke Z, 1<k<n, (n,k) =1}. 


DEFINITION 7.18. For € =e, we call the set {Xo; «+; An—1} defined by (24), the ordered 
eigenvalues of the circulant matrix (22). 


COROLLARY 7.19. The characteristic polynomial of V is 
n-1 


py (x) = det (wl —V) = |] (a - 2). 


1=0 


COROLLARY 7.20. We have Sx-9 Ar = nv. 


lWe reserve the symbols A, and 2; for this eigenvalue and eigenvector throughout this chapter. We use 
the convention that, unless otherwise specified, all vectors are column matrices. To avoid too many empty 
spaces, we will often write them as row matrices without mentioning that we are considering the transpose 
of the column vector. This identification should not cause any confusion. In a sense, it was already used 
in defining the shift operator T. In line with this convention, matrices, when viewed as linear operators, 
multiply column vectors on the left. 

2We will describe this matrix shortly. 

3The diagonal matrix with entries a1, a2, ..., @n—1, Gn is denoted by diag(a1, a2,...,@n—1, Gn). 


2. CIRCULANT MATRICES 149 


PROOF. Since 


we see that 


REMARK 7.21. The trace of a square matrix is the sum of its diagonal entries. Since 
the trace is a conjugacy class invariant, the last corollary also follows from the identity 
i A, = trace V, since the sum is the trace of Dy. 


DEFINITION 7.22. Let Circ(n) and Diag(n) be the sets of all n x n complex circulant 
and diagonal matrices, respectively, viewed as subsets of M,,(C), the algebra of n x n com- 
plex matrices with the usual matrix operations of addition and multiplication and scalar 
multiplication (by complex numbers). 


Diag(n) is an n-dimensional commutative subalgebra of M,,(C). Furthermore, transposes 
of diagonal matrices and inverses of nonsingular (a diagonal matrix is nonsingular if and only 
if the product of its diagonal entries (which equals its determinant) is not zero) diagonal 
matrices are also diagonal. We record a number of consequences of the last theorem that 
show that Circ(n) has many similar properties. As a matter of fact, we will show that 
Diag(n) and Circ(n) are isomorphic algebras. 


COROLLARY 7.23. Circ(n) is an n-dimensional commutative subalgebra of M,,(C). Fur- 
thermore, complex conjugates and transposes of circulant matrices and inverses of nonsin- 
gular circulant matrices are also circulant. All elements of Circ(n) are simultaneously diag- 
onalized by the same unitary matrix. 


PROOF. Our first observation is that Circ(n) is an n-dimensional vector space over the 
complex numbers C. Let C be the n x n matrix that represents the linear transformation 
sending the /-th unit vector e; (this is the vector (0,...,0,1,0,...,0) with the 1 in the /-th 
slot) to 2: 


i, iy, es 1 1 
€ en? ert 
(26) G22": . a A , 
VM] ene 6, fn2)? elm I)(n-2) 
1 ert Por ¢(2—2)(n—1) ¢(r-1)? 


Observe that C is symmetric (its own transpose) and that C*, the transpose of the conjugate 
of C, equals C~!, the inverse of C. Thus C is a symmetric unitary matrix. A lengthy but 
routine calculation shows that (25) holds. This calculation can be avoided if we use the 
definitions and results on eigenvalues at our disposal: 


C7!VC(e) = C7!V (ay) = C71(A 21) = C7" (21) = AIEI = Dy(e1). 


Thus the unitary matrix C’, that depends only on n, diagonalizes each circulant matrix. It 
is convenient to fix the matrix C' and study the map 


C*: Cire(n) 3 V+ C7!VC € Diag(n). 


150 7. ROOTS OF POLYNOMIALS 


For U and V € M,(C), 
(27) CO VGH UitV=CUC =. 
Since for all V; and V2 € Circ(n) and all cE C, 
CO*(Vi + cV2) = C*(Vi) + eC" (Va) 
and as a result of (27) for V € Cire(n), 
C*(V) = O iff V =O, 
we conclude that C* is a C-linear injection of Circ(n) into Diag(n). Since 
dim(Circ(n)) = n = dim(Diag(n)), 


we conclude that C* is surjective; that is, C*(Circ(n)) = (Diag(n)) and hence that Circ(n) 
and Diag(n) are isomorphic as vector spaces over C and (C*)~! = C- C7! maps Diag(n) 
onto Circ(n). Since for two diagonal matrices D,; and Dz, 


(C*)""(D, D2) = CD, D2,C = (CD\C™) (CD20) = (C*)~"(D1)(C*) (Da), 
we conclude that Circ(n) is closed under matrix multiplication. Since Diag(n) is closed under 
complex conjugation, transposes and inverses (of nonsingular matrices that it contains), so is 


Circ(n) by an argument similar to the one used to show that Circ(n) is closed under matrix 
multiplication. 


COROLLARY 7.24 (of proof). Circ(n) and Diag(n) are isomorphic subalgebras of M,(C). 
We proceed to describe another algebra that is isomorphic to Circ(n). If we let 
W =xire{ 0.1, 070.50) 


then it is easily seen that 
n-1 
cre Vein Oni} = So uw", 
i=0 
REMARK 7.25. With respect to the standard basis of C”, the shift operator T is repre- 


sented by the transpose of the matrix W; that is, by circ{0,0,...,0, 1}. 


COROLLARY 7.26. The map that sends W to the indeterminate X establishes an isomor- 
phism of algebras between Circ(n) and the algebra C[X|/(X” — 1). 


DEFINITION 7.27. Given a circulant matrix V = circ{vo,v1,.--,;Un—1}, we define its 
representer as the polynomial Py(X) = eis UpX?. 
COROLLARY 7.28. For] =0,...,n—1, we have that A; = Py (cr). 


We know that a matrix cannot be recovered from its collection of eigenvalues, not even 
from an ordered set of eigenvalues. However, given an ordered set of n eigenvalues: {A7, | = 
0,1,...,n — 1}, there exists a unique diagonal matrix D with entry , in the /-th slot. Thus 
CDC™! is the unique circulant matrix with this set of ordered eigenvalues. 


EXERCISES 
(1) A complex n x n matrix V is Hermetian if and only if V'= V. 


2. CIRCULANT MATRICES 151 


(a) What is the dimension over R of the vector space of n x n complex Hermetian 
matrices? 
(b) What is the dimension over R of the vector space of n x n complex Hermetian 
circulant matrices? 
(2) Let v € C” be the row vector 


Vi (Uo Vij ond Va): 
We have associated with v several other algebraic quantities: a circulant matrix 
V =circ{v} € Cire(n), 
an ordered set of eigenvalues 
Av = {Ao, Ar, --An-if € C”, 


the characteristic polynomial of V 


n-1 n-1 


pv (x) = det(al — V) = [[@ —r)=a"+ See 
i=0 k=0 
hence also a vector 
a = {a0, 44, ...A@n_1} € C” with a,_1 = —nvo, 


and the representer of V 


n-1 
= ) Upar. 
k=0 


Let J € C” and K € Circ(n) be respectively the vector and matrix with all entries 
equal to 1 (thus K = circ{J}). Let 
v’ = (0,1 — V0, ++; Un—1 — Vo) = V — Uo, 
and denote by symbols with primes the associated quantities for v’. Show that 
(a) V’ = V — uK. 
b) Ay = = {Ao — NVo, Mie an An t= = {- Se: MI; M1, chefs 
©) pyr() = (w+ DIS A) Ti @ — 0. 
cle: eae Cement: ae 5.0 
0) Pye) = SRN (ve — wo) 
f) Py (em )= Ny for /=1,2,..,n—-1 
(g) trace(V’) = 0. 
(3) Show that the n x n circulant matrices with trace 0 form a (n — 1)-dimensional 
subspace of Circ(n). 
(4) Let a € C. Show that there exists a 
V Sy Uwe we 
such that (with notation as in the first exercise) 
Av = {Xo =", At — a, ARAL a a} = Av J aJ. 


Show that 
(a) pyo(z) = py(a@ +a) = det((ex+a)I—-V). 


152 7. ROOTS OF POLYNOMIALS 


(b) trace(V”) = trace(V) — na and hence that for a = vp, trace(V”) = 0. 
(©) BSG. Opi. 


(d) Py Ga ) = \,-—a for! =0,1,...,n—1. 


3. Roots of polynomials of small degree 


The last corollary and the interplay between the characteristic polynomial py of a cir- 
culant matrix V and its representer Py leads us to a method for finding roots of (monic) 
polynomials of degree less than or equal to 4. 

The roots of the characteristic polynomial of an arbitrary n x n matrix V (these are the 
eigenvalues of the matrix V) are obtained by solving a monic n-degree polynomial equation. 
However, in the case of circulant matrices V, the roots of py are easily calculated using the 
auxiliary companion polynomial Py, the representer of V. Thus if a given polynomial p is 
known to be the characteristic polynomial of a KNOWN circulant matrix V, its zeros can be 
readily found. This remark is the basis the method we will describe for solving polynomials 
of low degree. It is thus of considerable interest to determine which monic polynomials are 
characteristic polynomials of circulant matrices. Further, if we are given that p = py for 
some circulant matrix V, can we determine V, or equivalently Py, directly from p? 

We can obviously recover V from its representer. If A = {Xo,..., An—1} is an ordered set 
of eigenvalues (viewed in (28) as a column vector in C”), then there is a unique circulant 
matrix V = circ{v} = circ{vo, v1,...,Un—1} whose ordered eigenvalues are : 


(28) v=VnC"Dd. 


If the eigenvalues are distinct, then there are precisely n! ordered sets of eigenvalues producing 
the same characteristic polynomial. In this case, there are n! circulant matrices V with 
characteristic polynomial py. Corollary 7.20 tells us that for each such circulant matrix 
V = Cre{0g, ian Basil, Vo = ty A, is independent of the ordering of the eigenvalues; 
however, the v; for 1 < 7 < n do depend on the ordering. If k is the number of distinct 
roots of the characteristic polynomial, then there are of course at least k! circulant matrices 
with the given characteristic polynomial. In particular, every monic polynomial p is the 
characteristic polynomial of some circulant matrix V. 


DEFINITION 7.29. Let n be a positive integer, p be a monic n-th degree polynomial, and 
V ann xX n circulant matrix. We say that V adheres to p or V is the adhering circulant 
matrix to p if p = py (that is if p is the characteristic polynomial of V). 


But the above argument avoids completely the issue of finding the roots of p, as the given 
construction of V from p started by assuming we had the roots of the polynomial. So the 
more difficult question is the construction of V (or equivalently, it representer Py) in terms 
of the coefficients of the polynomial p. 

We are now ready to state and try to solve the problem of interest. Let us consider a 
monic n-th degree polynomial p: 


(29) p(x) = 2" + Oy_10" 1 + dnote” 2 +... + ae + aD, 
where a; € C. A basic result in complex analysis (which we now use) tells us that the 


polynomial has precisely n roots, counting multiplicities. Thus there exist complex numbers 
GB; such that 


P(x) = (@ — Pi)(@ — a)...(@ — Bn). 


3. ROOTS OF POLYNOMIALS OF SMALL DEGREE 153 


The task is to find these roots (;. Since 


n 
QAn-1 = S- Ba 
i=1 


eliminates the term of degree n — 1; that is, it changes (29) to 


An-1 
n 


the substitution y = «+ 


An in ee 
(30) aly) =p (y- tay + 4,0? BoE ay E703 


where the constants y; are easily computed in terms of the a;. If we can solve equation (30), 
then we can certainly also solve (29) since 


a(y) = (y- 41 -— 1) (y - B — = | " (y- 5, ~ “1) 


n n 


An-1 


In the notation of the last set of exercises, if p = py, then q = py”, where we use a = — 


in the definition of V”. 


3.1. Roots of linear and quadratic polynomials. The solution of the linear monic 
equation 


r+ Ao 


does not present any problems; so we proceed to the quadratic monic equation (still only a 
warm-up exercise) 


(31) az” + ax + ag. 


The classical solution of the last equation is based on completing the square; rewriting it as 


2 

P+ ort (2) +00 LINY (mre teen eae 

T 1 T T a apt ge T T — ; 
4 4 


thus its roots are 


As a warm-up exercise we use circulant matrices to solve for the roots of (31). We are looking 
for a circulant 2 x 2 matrix 
a b 
VeVi 


xr-a —b 
pr(e) = det | eee 


whose characteristic polynomial 
| = 2? —2axn+a7?— 0b? 


equals (31). We are thus required to solve 


—2a = Qa 
a? —b? = ag’ 
The solution is easily seen to be 
2 
a a 
a=—> and b= —1 — ag 


154 7. ROOTS OF POLYNOMIALS 


There are in general, of course, two square roots of the complex number En — ag, we choose 
one of them. The representer for V is hence the polynomial 


at a 
Pyl(a)= ( a bos 


Thus the roots of the original quadratic polynomial (31) (which is also py) are 


Ay ay 
Py(1) =- _ 
v( ) 9 A Qo 
and 
Qy a? 
Ns ge = 


2 
We observe, as expected, that a our choice of the square root of a — Qo only affected the 
order of the roots we found. 
We leave it to the reader to recast the above discussion in terms of 2 x 2 circulant matrices 
of trace 0. 


3.2. The general case. We are now given the monic n-th degree polynomial p of (29) 
and we are trying to find a circulant matrix V = circ{v} = circ{vo, V1,...,Un_1} whose 
characteristic polynomial 


&—V9 Uy ttt ~Un-2  —Un-1 
—Un-1 © VQ *** ~Un-3 —Un-2 
pv(x) = det(aI — V) = det 
—v2 —V3 sss Y — UO =U: 
U1 —V2 ste —Un-1 LY VO 
n-1 
Se ( s] sed a 
i=0 


is equal to p, the polynomial whose roots we are trying to find. We are thus attempting to 
solve for the n unknown v; in n-equations; the first of these is 


n-1 
a Sou = trace(V). 
i=0 


We have already seen that we can make a simple change of variable to eliminate the 
term of degree n — 1 in the equation (29) and hence solve for the roots of (30). We are thus 
looking for a traceless circulant matrix V; the first equation to be solved is now 


n-1 


0= Sou = trace(V). 


i=0 


We will use this reduction in the next two subsections. 


3. ROOTS OF POLYNOMIALS OF SMALL DEGREE 155 


3.3. Roots of cubics. We start with the normalized? monic cubic 


qty) =y? + By +7 


and look for a traceless circulant matrix 


0 be 
CS >) se 0D 
Oe. A 
whose characteristic polynomial 
y —b -c 
det(yf —C)=| -c y —b | =y? —3bcy— (Bb? +c°) 
—b -c y 


equals q(y). The equations to be solved for b and c are 
b? +c? = -7 and 3bc = — 8. 
Cubing the last equation we solve for b? and c? in 
Pie S—yand 2c Se. 
Replacing, as is permitted by the second equation, for example, c® by _£ in the first of 
these equations and then using the quadratic formula we conclude that 


/ 48 
‘as TIVE YY t+ or 

5 : 
The same formula holds for c?, however we must choose square roots consistently. Thus 
having chosen one square root 


/ 43 48 
2 2 


We must use a little care, as suggested by the last displayed equation, because we are working 
with complex numbers. We choose one of the six? possible values of 


2 
We then set 
yee 
3b 
To keep track of what is going on, let us write 
433 
20° = — 6 oS +, 


4To have 0 as the coefficient of its quadratic term. 
"In the generic case. 


156 7. ROOTS OF POLYNOMIALS 


and observe that the definition of 6 involves a choice of a square root and that the resulting 

5 may be replaced by —6 and the resulting two values of b may each be replaced by w/b with 
Q7r 

w=e3 andj =1 or 2. The values of b and c are described by 


WT est ek 3 Be fet 3 
b= ( 5 ) and c= ( 5 le 


it follows from these equations that 3bc = —(33)3. Having chosen a cube root in the formula 
for b, the correct cube root must be chosen in the formula for c so that the last equation 
becomes 3bc = —(. With these conventions in mind, the three roots of g can now be written 
as P(w/) with 7 = 0,1,2 where 


P(y) = by + cy’. 


= 3 ee 
n=b+e=( =) -( 5 *) ; 


= 3 hoes ANS 
r= be +00 =u ( Ss ) ru? 5 *) 


= 3 ee 
r= be? ou =a" a) ro ( 5 *) 


We can determine the effect of the various choices made on the values of the roots. 
Replacing 6 by —6, certainly interchanges b and c. Hence r, is fixed while rg and r3 are 
permuted. Keeping 6 fixed and replacing a cube root choice, say b by wb (the only other 
possibility is wb results in the replacement of say c by wc. Thus r; is replaced by ro, re 
by rz and hence (without the need of a calculation) r3 by 71. In general, the various choices 
made correspond to the action of the symmetric group S(3) on the roots of the monic cubic 
polynomial. 

If we start with the general monic cubic 


Thus the roots are 


and 


3 2 


p(@) = 2° + aon" + 0124+ Ap, 


then the change of variable y = x — $ reduces us to the normalized case with 


— =Q2Q1 + Ao. 


1 
B= —=03 +0, and 7 = Tie 3 


3 
3.4. Roots of quartics. To find the roots of the normalized monic quartic 


q(y) = y* + By’ +yy +6 


we look for a traceless 4 x 4 circulant matrix 


C= 


oe Se) 
S- Sy Oo 
QOS 
oma a& 


3. ROOTS OF POLYNOMIALS OF SMALL DEGREE 157 


whose characteristic polynomial 
y —b -c —-d 


det(yI—C) = Bs aye rs a = y*—(4bd+2c*)y?—4c(b?-+d?) y+c4*—b*—d*—4be?d+2b°d? 
—b -c -d y 
equals q(y). We thus have to solve for b, c and d the system 
4bd + 2c? = —6 
(32) 4c(b? +d?) = -74. 
ct — b+ —d*—4be?d+2b'd? = 6 


Assume for the moment that c 4 0. In this case, the first and second equations of (32) can 
be rewritten as 


(33) ye ee 


and 2? +4@2 ———., 
Ac 


they determine bd and b* + d? in terms of c. This suggests that the third equation of (32) 
be rewritten as 

c* — Abdc” — (b° + d*)? + 4(bd)? = 6. 
Substitution of (33) into this equation and clearing fractions leads us to 


ee or ae cee ie a ee 
(34) Ci tese (EB ar ea = 9 


a monic (not normalized) cubic in c? that can be solved by the methods of the previous 
subsection. Choose one non-zero root c of (34). This is possible since 0 is a root of (34) if 
and only if y = 0; while all the roots of this equation are 0 if and only if G=0=6 = ¥. 
Using this value of c, we solve for b and d in (33). The roots of q(x) are then the values at 
1, 2, —1 and —z of the representer 


P(y) = by + cy? + dy” 
of the circulant matrix C’. If y = 0, we can of course use c = 0 and solve for 6 and d in 
Abd = —8 and b? — d* = V—-6. 


The various choices made lead to an action of S(4) on the roots of g. We illustrate the 
various ideas encountered with a MAPLE program to solve the equation 


x — 10x? + 35x27 — 50x + 24 = 0. 
MAPLE SESSION #xx 


= “pte S> ee S10 ae OS Ob RD = 50 eX te 245 
p:=2—> 2*—102? + 3527 —5024 24 
> solve(p(x) = 0,x); 
1, 2, 3,4 
2 gS sep Cy bobs 


5 
g=y > P+ 5) 


158 7. ROOTS OF POLYNOMIALS 


> expand(q(y)); 


5 9 
ta eee oe eet 
Bg eee 
> solve(z*2 - 5/2 * z + 9/16 = 0,z); 
9 1 
4’ A 
beta := - 5/2: gama := 0: delta := 9/16: 
solve(t*3 + (beta/2) * t"2 + (beta*2/16 - delta/4) * t - gama*2/64 = 
Oty 
1 
0,1, — 
9 7A 
SS oe. s= 30: 
c:=0 
> solve({4 * b * d = -beta, b*2 - d*2 = sqrt(-delta)},{b,d}); 
Dy —-1 3 i ae I a3 Be io ik ane E 
d= —+-J,b= —-—-I},{d=--—--I,b=-+-TI}, {d= —++-I,b=—-—-I 
3 #1 3... otk 
=--—-IJ,b=-+4-I 
id 44" 4 rie! 
> b := -1/4 -3/4 * I: d := -1/4 + 3/4 * I: 
> P:i=t> better t°2 +d * t73; 


P:=t—-bt+c??+de 


> P(1) + 5/2, PCI) + 5/2, P(-1) + 5/2, P(- I) + 5/2; 
PAB 


***END MAPLE PROGRAM?*** 


We describe the various steps of the above program. 


(1) The first line of the program introduces the polynomial p(x) to be solved. 

(2) The second line uses the internal MAPLE command to solve this equation (as a 
check on our work). 

(3) The third and fourth lines change the polynomial p(z) to its normal form q(x). We 
note that the roots of p(x) are the roots of q(x) plus 3. 

(4) The next line (again a check on our work) uses MAPLE to solve the normalized 
polynomial g(a) — because it is a quadratic in y?. 

(5) Next, the constants defining the q(x) are entered. 

(6) We solve for the possible values of c and use c = 0. Thus we will use a circulant 
matrix of the form V = circ{0, }, 0, d}. 

(7) We proceed to solve for the possible values of b and d, and choose a solution set 
(out of the 4 possibilities). 

(8) The last two commands calculate the roots of p(x) using the representer P(y) of V. 


3.5. Real roots and roots of absolute value 1. We fix throughout this subsection 
ann € Zyo, amonic n-th degree polynomial p and an n x n matrix V € Circ(n) that adheres 


3. ROOTS OF POLYNOMIALS OF SMALL DEGREE 159 


to p. We are interested in learning what properties of p are reflected in V. Recall that in 
the case under study p is the characteristic polynomial of V. 


THEOREM 7.30. The monic polynomial p has only real roots if and only if its adhering 
circulant matrix V is Hermetian. 


PROOF. It is easy to see that a monic polynomial p € C[z] with only real roots must 
belong to R{[z] (that is, it must have real coefficients). A matrix V is Hermetian iff V = V*. 
We know from linear algebra that the eigenvalues of a Hermetian matrix are real. Thus if V 
is Hermetian, p has real roots. Conversely, if p has real roots, then V has real eigenvalues. 
Thus there exists an n x n real diagonal matrix D with V = CDC*t and C is defined by (26) 
(in particular, C7! = C*). We now compute 


Vie= (CDC) = CDC = CDC SV; 
thus V is Hermetian. Oo 


Many other properties of p(a) can be read off from V. We record some of these in 


THEOREM 7.31. Let p be a monic polynomial with adhering circulant matrix V. Then 
(a) the roots of p(x) are real if and only if V is Hermetian (V = V+), 
(b) the roots of p(x) have absolute value 1 if and only if V is unitary (V~-! = V*), and 
(c) the roots of p(x) are purely imaginary if and only if V is skew-Hermetian (V = —V*). 


3.6. What goes wrong for polynomials of higher degree? Will the method de- 
scribed in subsections 3.3 and 3.4 work on polynomials of degree > 5? The answer to 
this question is a resounding NO, but that is a topic for an entirely different chapter of 
mathematics. 


EXERCISES 


Qe + 2? + 32°. 


g 
14274 2°. 
1 
1 


x+2x? 4+ 2%. 

(d) 1+ a2 + 22? + 32? +27. 

(2) Complete the solution of the example worked out by the MAPLE program by choos- 
ing a value c #0. What is the formula for the representer P(y) in this case? What 
element of S(4) is represented by permutation that changes the MAPLE solution 
to your solution? 

(3) (a) Prove parts (b) and (c) of Theorem 7.31 using the ideas in the proof of Theorem 

7.30 (part (a) of Theorem 7.30). 

(b) Deduce part (c) of Theorem 7.31 from Theorem 7.30 and the observation that 
a monic polynomial p(x) of degree n has purely imaginary roots if and only if 
the monic polynomial ” ) has only real roots. Describe an adhering circulant 


nm 
p(x) 
in 


matrix V for p() in terms of one for 


CHAPTER 8 


Moduli for polynomials 


Preliminary version of chapter. Written in a form that should be accessible to most high 
school mathematics teachers. Certain results from previous chapters are repeated here to 
make the presentation more self contained. 


1. Polynomials in three guises 


Let n be a positive integer. We will be dealing with expressions of the form 
(35) pla) = Gane” + Gy ae! ee a; 
or alternatively just the expression 
(36) a «eae CS 7 


where x is an indeterminate or independent variable, each a; is a real number (or more 
generally, a complex number) and a, # 0. 

We call (36) an n-th degree polynomial (with real or complex coefficients). Polynomials 
with real (complex) coefficients will be referred to as real (complex) polynomials. At times 
there are significant differences between the two cases. Polynomials can be manipulated 
using the usual rules of arithmetic; for example, replaceing x by ax + 3 with a 4 0 and @ 
real or complex numbers transforms it into another n-th degree polynomial. An algebraist 
would say that (36) is an element of the integral domain R{| or C{z]. 

It is quite clear that (36) and (35) are more or less the same object. The introduction of 
the extra symbol p(x) or p in (35) reminds us that we may also think of a polynomial as a 
function 


p:R—Rorp:C—-C, 


with p(x) denoting the value of this function at x € R or C, respectively. When viewed as 
a function from C to itself a real polynomial has the added feature of assuming real values 
at the real points in C. The symbol p is the dependent variable and y is another common 
letter used to represent it. Associated to any funnction f : R — R is its graph; the set 


{(x,y) € R? such that y = f(x)}. 


This set is often identified with the function f. 
When discussing roots of the polynomail (36) or (35) we are looking for those x (usually 
in C) that satisfy the equation 


= 
0 =anx” + an_-10" +... + ao. 


161 


162 8. MODULI FOR POLYNOMIALS 


2. An example from high school math: the quadratic polynomial 
The study of first degree polynomials 
I(x) =axr+b 


with complex coefficients (a and b are complex numbers and a ¥ 0) is certainly trivial for 
all high school mathematics teachers. The quadratic polynomial 

(37) y(x) =az? +br+c 

with complex coefficients (a, b and c are complex numbers and a # 0) is only slightly more 
challenging. We present a treatment that is much more concise than the one found in many 
high school texts (for example in [5]). The graph of the standard form of the real quadratic 
polynomial y(x) = x? (the special case a = 1, b = 0 = c) is certainly familiar to every 


reader. By completing squares and applying elementary algebraic manipulations one easily 
concludes from (37) that 


8 a) ee) 


This tells the complete story for quadratic polynomials: 


e Every quadratic polynomial has two, perhaps complex, roots counting multiplicities. 
e The roots of (37) are easily obtained from its equivalent form (38); they are 


1 V0? = 4ac b? — dac 
-s4 2a 
e For real quadratic polynomials, the roots are real and distinct if the discriminant 
b? — dac of the polynomial is positive. 
e There is one double real root if the discriminant vanishes. 
e There is a pair of conjugate complex roots if the discriminant is negative. 
e The linear change of coordinates 


b y . 6? —4ac 
.— — and Y = = + —— 
a7 bs 4a? 


transforms (37) (and (38), of course) to standard form Y = X?. 


Up to change of coon this is all there is to quadratic polynomials; that 8s, the 
general quadratic polynomial ax? + br + ¢ is Opuance from the standard quadratic x? by 
first pre-composing mn the affine map © x+ 5° 2 and then post-composing with the affine 


map re ar+c— cs We illustrate the deiieapt of change of coordinates in Figure 1 with 
an example that hints at the important equivalence relation we are about to introduce. 

We do not claim this approach is the way an excellent teacher would present the quadratic 
to a high school audience, but rather that every competent high school mathematics teacher 


should be aware of this elegant and short treatment. 


3. An equivalence relation 


If we start, in general, with a polynomial y = y(x) in the independent variable x and use 
the linear change of both independent and dependent variables 


X=ar+b, Y=cy+d, 


3. AN EQUIVALENCE RELATION 163 


\ / 


‘ LZ 


1 08 6 4 2 0 OS 04 06 08 4 
x 


FIGURE 1. On the left is the graph of y = —2x?+2+1. Apply the change of 
coordinates X = x — ‘, Y=S+ i to get the standard form Y = X? graphed 
on the right. 


where a, b, c and d are complex constants with ac 4 0, we obtain a polynomial Y = Y(X) 
in a new independent variable X. Notice we can write Y(X) = Y(ax + b) = cy(ax + b) +d. 
We consider these two polynomials to be equivalent. Of course, equivalent polynomials share 
many properties. Most importantly, the real or complex number x = r is a root of y (so 
y(r) = 0) if and only if the polynomial Y evaluated at X = ar + b equals d. In particular, if 
the dependent variable is not subject to a translation (that is, if d = 0), then r is a root of 
y(x) if and only if ar + b is a root of Y(X), or equivalently, —* is a root of Y(ax + b). 

We emphasize that our changes of coordinates allow translations, resizing, and sign 
changes in both the independent and dependent variables. We may consider only real poly- 
nomials and work only with real affine maps. Let us examine the situation in a more formal 
way. 

First degree polynomials are also called affine maps of the complex plane C. They form 
a group under composition (composing two affine maps generates a third one). We denote 
by L~'(a) the inverse of the affine map L(z) in this group’. Our equivalence relation (which 
we will denote by the symbol =) is actually independent of the dependent variable y: The 
polynomial p(x) is equivalent to the polynomial q(x) if q(x) can be obtained from p(x) by 
pre-composition and post-composition with affine maps; that is, if and only if there exist 
affine maps L(x) and L(x) such that 


(39) q(x) = Lo(p(Li(@))). 


We note that every polynomial p(x) is equivalent to itself since the identity map is affine. If 
p(x) = q(x), then there existaffine maps L; and Lz such that (39) holds, and thus 


p(x) = Lz" (¢ (£7 '(z))); 


or q(x) = p(x). Finally, if p:(@) = po(x) and po(x) = p3(x), then there exist affine maps Lj, 
i= 1, 2, 3, 4 such that 


p2(@) = Lo(pi(Li(a))) and ps(@) = La(po(La(x))). 


'The notation is meant to emphasize the fact affine maps are polynomials of degree one. 


164 8. MODULI FOR POLYNOMIALS 


Thus 
p3(z) = La(Lo(pi(Li(La(2))))) 
or pi(x) = p3(a). 


4. An example all high school math teachers should know: the cubic 
polynomial 


To understand what is going on with quadratic polynomials, one should be able to answer 
several questions. How much of the development of 82 carries over to arbitrary real or 
complex polynomials (of degree 3 or higher); how much is peculiar to quadratics? What do 
these questions mean? 

An example more challenging than the quadratic polynomial is the real or complex cubic 


(40) p(x) = y =axr* + bz? + cx +d, 


with a, b, c and d real or complex numbers and a ¥ 0. 
If p(x) € Riz], then by the intermediate value theorem, p(x) must have at least one real 
root r. It follows that 


p(x) 

L-T 
for some real constants 3 and ¥, so that the analysis of the cubic p(x) € R[x] can be reduced 
to a study of the quadratic ax? + Bx +7. We follow a slight variation of this observation; 
in prt, because it does not tell us how to find r algebraically. But first, we make a small 
digression to observe some facts about 


=ax*+ Ba+y 


5. Arbitrary real or complex polynomials 


Let us return to the arbitrary n-th degree real or complex polynomial (35). A basic fact 
is the Fundamental theorem of algebra, whose easiest proof is via complex analysis (see, 
for example, {[1, page 122]): such a polynomial has precisely n-complex roots”. Because of 
the division algorithm, this implies that 


p(x) = ones = ri )(x = Py) ace — Wes \ 
where each of the r; is a (complex) root of p(x). In case the polynomial has real coefficients, 


we find that if r € C is a root of (35), then so is its complex conjugate 7. The proof of this 
assertion is rather easy. The fact that 0 = p(r) tells us that 


0 =p(r) = S- air' = So air =a tae 
i=0 i=0 


so that 7 is a root of (35). We claim more: if the complex number r is a root of multiplicity k 
of (35) (that is, precisely k of the r; equal r), then so is 7. The easiest proof of this assertion 
is by induction on the degree of the polynomial. We may and do assume that r ¢ R since 
otherwise there is nothing to prove. We may also assume that n > 3 and k > 2 because 
otherwise there once again is nothing to prove. Our assertion holds for n = 3, because in this 
case we have at least one real root, sok = 1. Now assume n > 3 and the complex number r 
is a root of p(x) of multiplicity k. We already know 7 ¥ r is also a root of p(x). Note that 


(g—r)(e@—F) H=2? —(r+r)e+7r, 


Even for real polunomials, the roots may be complex. 


6. BACK TO THE CUBIC POLYNOMIAL 165 


with both r+/7 and r7 real numbers. Hence 


p(x) 
tee coe psa 


q(x) = 


is a real polynomial of degree n — 2 having r as a complex root of multiplicity k — 1. By 
complete induction 7 is also a complex root of multiplicity k — 1 of q(a). Hence a root of 
multiplicity k of p(x). 


6. Back to the cubic polynomial 


We begin a leisurely exploration of the real cubic. The study of the real cubic polynomial 
(40) does not require the material of the last section, except for the fact that non-real roots 
come in pairs (a complex number and its conjugate). We start with the useful, if trivial, 
general observation already encountered in §3. Let a and 6 € R with a 4 0, then r is a root 
of p(x) if and only if = is a root of p(ax + b). 

As observed earlier, p(x) must have at least one real root. Therefore we have either 1, 2 
or 3 distinct real roots. We consider cases: 


(1) Exactly one real root. 
e This root could have multiplicity three. A good example is the special case 
p(x) = x? whose graph is shown below. 


1 


/ 
/ 
/ 


y 


08 a 02 04 06 08 4 
x 


fs 
/ 


0.55 


FIGURE 2. y= 2x? 


The general case p2(x) = a(x—b)?, where a and b are real numbers with a 0, is 
reduced to the special case by the change of coordinates P = ” and X = «—b. 

e This root could not have multiplicity two. If it did p(x) would have to have 4 
roots, counting multiplicities, of course. 

e This root could have multiplicity one. In this case, the polynomial must have 
a pair of complex roots. A typical example here is p3(x) = x(x? + 1) whose 
graph is shown below. 

The general equation here is of the form p4(x) = c(x — r)(x* + ax + 3) with c, 
r, a, G real numbers, c 4 0 and a? — 43 < 0. If we make the change of variable 
x=aX +6 witha 40 and 6 real, we transform the general equation to 


p(aX +b) =c(aX +b—1)(a?X? + (2ab+aa)X + (b? + ab+ 8)). 


166 8. MODULI FOR POLYNOMIALS 


2 


/ 
/ 
/ 


| cA 
4 4 02 04 06 08 4 
x 


-14 
ye 
/ 
| 


24 


FIGURE 3. y = 2(2? +1) 


We now set 6 = r (thus placing the one real root at the origin) and consider 


two cases: 


— If w = —2r, we conclude that py(aX +r) = caX(a?X? + (6G —r?)) and 
after further change of variables we reduce the equation to the form 


P(X) = X(X*4+ 4d), d>0. 


— If a 4 —2r, we cannot kill the first degree term in the quadratic polyno- 
mial a?X* + (2ab + aa)X + (b? + ab+ f). If p(x) has complex roots at 


p and p, then p(aX +r) has complex roots at °* and &*. 


By properly 


choosing a, we can ensure that the roots of the polynomial p(aX +1) are 


at 0 and e +2 for some real number e. Thus we conclude, 


choices for a and 6, 


p(aX +b) = dX(X? — 2eX +4 (e? + 1)) 


with proper 


for some non-zero real number d. So the standard form of our polynomial 


in this case is 


P(X) = X(X? — 2eX + (e? + 1)) for some e € R. 


We graph another example: ps(x) = (x + 2)(a* — 22 + 3) which will shortly lead 


us to reconsider our approach. 
(2) Exactly two distinct real roots. 


In this case one of the roots must have multiplicity one and the other multiplicity 
two. A typical example is pg(x) = x(x — 1)? whose graph is shown below 

The general case is p7(x) = c(x — r1)(x — rg)? with c, ry and rz real constants, 
e#O andr; #12. The change of variables x = (rz — r1)X +11, P = ® reduces the 


general case to the typical example. 
(3) Three distinct real roots. 


A typical example is pg(x) = x(a+1)(a—1) whose graph is once again shown below 


The general case is p(x) = c(x — 11)(@ — r2)(a4 — r3) with OA cE 


R and rj, r2 


and r3 three distinct real numbers. The change of variables x = (rg —171)X +11, 


P =® reduces the general case to the standard form 


P(X) = X(X -1)(X -a), witha eR, 04 aF 1. 


6. BACK TO THE CUBIC POLYNOMIAL 167 


125 


FIGURE 4. y = (x + 2)(x? — 2r + 3) 


x jk, 
2 2 Sates ie 


-~104 


FIGURE 5. y = 2(x—-1)? 


FIGURE 6. y = 2(x+1)(4 —1) = x(a? — 1) 


The reader should notice a striking similarity between the graphs of ps; (Figure 4) and pg 
(Figure 6). This is, of course, not a coincidence. As can easily be seen, 


ps(x) = p(x) + 6. 


168 8. MODULI FOR POLYNOMIALS 


This suggests that we have not paid sufficient attention to the dependent variable. We 
now make a “bold” conjecture: If y is a cubic with 3 distinct real roots, there is a constant 
c such that y + c has precisely one simple real root (and thus also a pair of complex roots). 
In verifying this claim, we may assume that lim, ,_,, y(a) = —oo, since we can replace y by 
—y if this is not the case. Let rj < rg < r3 be the three roots of the cubic. From calculus we 
know that the restriction of y to the closed interval [r2,r3] has a negative minimum —c at 
some point x, € (2,73). So the function y + 2c has precisely one simple real root. Therefore, 
in the classification of cubic polynomials up to equivalence, we may ignore the case of three 
distinct real roots. 

The above analysis on the structure of the roots of the real cubic can also be obtained in 
a very straight forward manner. As we have seen, we can think of the general cubic as the 
product of a linear term and a quadratic term. The quadratic term can have either no real 
roots, one real root of multiplicity two, or two distinct real roots, while the linear term must 
have exactly one simple real root. If the quadratic term has no real roots, then the cubic 
must have exactly one real root of multiplicity one and two complex conjugate roots. If the 
quadratic term has one real root of multiplicity two, then the cubic must have either two real 
roots (a simple one from the linear term and the one of multiplicity two from the quadratic 
term), or one real root of multiplicity three (if the linear and quadratic roots coincide). If 
the quadratic term has two distinct real roots, then the cubic has either three distinct real 
roots, or again, one root of multiplicity one, and one root of multiplicity two. 


7. Standard forms for cubics 


The above analysis does not produce a satisfactory set of standard forms for cubics. To 
obtain a more satisfactory set, we start with the arbitrary real cubic polynomial (40) and 
describe an algorithm consisting of a series of steps, not all intuitive, which reduces it to 
standard form.? We work only with real affine maps in this reduction process. 


(1) By rescaling the dependent variable, replacing y by ay, we may assume a = 1. 

(2) Completing the cube, replacing x by x — g, allows us to assume b = 0. 

(3) If c = 0 we proceed to the next step. Otherwise, we resize both the dependent 
(replacing y by |c|2y) and independent variables (replacing x by ry/|cl), allowing 
us to assume c = +1. (The case +1 occurs when c > 0 and —1 when c < 0.) 

(4) A final translation of the dependent variable (replacing y by y + d) reduces the 
original equation to standard form 


(41) P(x) =2°+ex, €=-1, Oorl. 


We show next that no two of these three polynomials are equivalent. As we have seen, 
the family of cubics equivalent to y = x° can be written c(ax + b)? + d with a, b, c and d 
real and ac 4 0. But 


c(ax + 6)? + d = atca? + 3a*bcx* + 3ab?cx + bee + d. 


So for 23+ 2 or x? —z to be in this family, we would need a?bc = 0 and ab?c ¥ 0. Similarly, 


the family of cubics equivalent to x? + x can be written c[(ax + b)* + (ax + b)| + d. If we 
want to reduce this to x? — x we must have b = 0 = d, which gives the family a°cx® + acz. 
But now we would require a°c = 1 and ac = —1 so a? = —1 and a could not be real. 


3The values of the constants a, b, c and d keep changing during the process. 


7. STANDARD FORMS FOR CUBICS 169 


It is important to realize that while the development of this section did not rely on the 
intermediate value theorem, it has NOT established the existence of a real root for every 
cubic. 

Much of our discussion of cubics is accessible to bright high school students and it leads us 
to several natural questions; some that students can pursue. Every third degree polynomial 
belongs to one of three equivalence classes. How do we determine which equivalence class 
contains a specific polynomial. Are there common characteristics shared by all polynomials 
in an equivalence class? If so, what are they? Note that multiplicities of roots is not constant 
over an equivalence class. The polynomials x? — x and x(x — 1)? are in the same equivalence 
class; the first of these has three simple roots and the second only two distinct roots (precisely 
one of these of multiplicity 2). 

We have seen that there are only three equivalence classes of cubic equation and that 
representatives for these classes are described by the standard forms (41). These three 
equations are easily solved. If € = 0, then 0 is a root of multiplicy 3; if € = —1, then 0 and 
+1 are simple roots, and if € = 1, then 0 and +: are simple roots. Does this information help 
us solve the genral cubic p(x) given by (40)? We know that there are four real constants a, 
GB, y and 6 such that ay # 0 and 


(42) ax® + ba? +cx+d=a |(yx + 6)? + (yx + 5)] + 8. 


The unknown constants are easily determined. Equating the coefficients of the same powers 
of x in (42) leads to four equations: 


a=ay’, 
b = 3ay’"6, 
c= ary [367 oe €| 
and 
d=ad | +e] +8, 
and we seem to have replaced solving a single cubic equation with solving four equations in 


four unknowns that also involve cubics — not obviously a simpler problem. Again a change 
of course is useful. There exist affine maps L; and Lz such that 


DL [a(Li(z)*) + (Li(x)”) + cLi(x) + d] = 2? + ex 


with « = 0 or +1. The algorithm described at the begining of this section tells us how to 
compute the two affine maps. We find that 


9703 =a a 


when b* — 3ac = 0 (in this case €« = 0). If b? — 3ac £ 0, then 


1 2b 1 = 
in(t) == 2 and L(x) = = ( be | *) _ Gad — be 


a 9a? 


b 1 a ae 
L— and L(x) = =x — (ara = zal a) 
3a qe Sac 2 b2— =Bac|? 


b? — 3ac 


Hl) 3a 


with ¢ = 1 if b? — 3ac < 0, and e = —1 if b? — 3ac > 0. 
Inverting L, and Lz, we conclude that if b? — 3ac = 0 then 
Sad — b b 
CHa p= = 7 ete lo = 
9a 3a 


170 8. MODULI FOR POLYNOMIALS 


and : 
=a b 

b2-3 2 

3a| = ac|2 


3a 
if b? — 3ac 4 0. We are looking for the zeros (roots) of the left hand side of of (42). Using 
the right hand side of that equation we need to find the complex numbers r such that for 
P(x) = x3 + ex, we have P(yr + 6) = —2. So* the problem reduces to solving the equation 
(43) Ppa 0 psn! eS R. 
2a 

The above discussion assumed that we were dealing with a real cubic. What changes 
need to be made for a general cubic? In studying the complex cubic, it is natural to consider 
complex affine maps in the definition of equivalence of polynomials. One can easily be 
convinced that in this setting the two cubics x? + x and x* — x are equivalent, but neither 
of these is equivalent to the cubic x° — resulting in two rather than three equivalence classes 
of complex cubics. 


b? — 3ac 
3a 


8. Solving the cubic 


We continue with the notation of the previous section and aim to solve (43). Assume, 
however, that the two constants « and 7 € C*. We follow a method discovered by the 
sixteenth century Italian mathematician Girolama Cardano (and probably many others). 
Let us write 

T=ytzZ. 
This may seem like an unnecessary complication, but it actually gives us additional freedom, 
because we will be able to introduce a convenient side relation between y and z. Algebraic 
manipulations transform (43) to 


(y? + 2°) + (y + z)(8yz + €) — 2n = 0. 
This last formula reveals the appropriate side condition: 
3syz+e=0, 


and thus leads us to consider the system 
353 e 
Yu= ~ 38" 
But look! This reduces the problem to solving a quadratic equation (in y?) 


e 


3 
y el) asa: 


y+2? = 27. 


(We note that y # 0 since « 4 0.) The solution is 


‘ner 
y 1) 1) 3 2 


Note that this quadratic equation in y? appears to give six solutions: one “positive” and 
one “negative” for each cube root of y?, where “positive” and “negative” denote a choice of 
sign in the quadratic formula. As we know, the cubic has at most three distinct roots. We 
must appropriately choose three from the six possibilities. Note that the possible 6 values of 


4The cases with € = 0 or 6 = 0 are of course trivial. 


9. SOLVING THE QUARTIC 171 


y are identical to the 6 possible values of z. We must now use the side condition yz = —$. 
Thus each of the possible values of y corresponds to a unique value of z. By symmetry, the 
6 possible choices for y result in only 3 possibilities for y+ z. In practice, it is not necessary 
to find all the possible solutions of the quadratic or cubic equations. Once we have a single 
solution to the cubic, chosen arbitrarily, we can factor this out of the original equation, 


leaving ourselves with a well understood quadratic term. 


EXAMPLE 8.1. As an example of Cardano’s method we solve the cubic +? — 3x + 1 = 0. 


For this example, using the notation introduced above, ¢ = —3 and 7 = —3. We are lead to 
solving the quadratic (in y’) y? +y? +1=0. The resulting 6 possible values of y are e5, 
e's", ee, eos, es andes. These are also the possible 6 values of z. Since we know 


that yz = 1 (leaving us precisely 3 pairs of solutions), athe solutions (y+ z) to our cubic are 
r=es (1 + e¥), eon (1 + ef) and e-s" (1 + eR). 


The above method solves the general cubic 


x? box? bx bo 
which reduces to the form 
ge + a,x + Ao 
after replacing « by x — Pe Observe that finding the solutions involves the usual field 
operations (on (C,+,-)) and the extraction of square and cube roots. 


9. Solving the quartic 


Our aim is to solve the general quartic 


x* + bx? + box? + dbyx + bo 


which reduces to 


D+ aQu" + a,x + ao 

by replacing x by x — a We follow a method discovered by Lodovici Ferrari, who certainly 
was familiar with Cardano’s work. To solve the reduced quartic, we factor it into a product 
of quadratics 
(44) x + apt” + a,x + ag = (2? +r + B)(z? + yr +8). 
As a consequence of the fundamental theorem of algebra, we know that such a factorization 
is possible. What is not obvious is that we can find an algorithm for determining the four 
constants a, 3, y and 6 in terms of given az, a; and ap. 

Equating the coefficients of the x* terms in (44), we obtain y = —a, the first step in 
reducing the problem to more manageable size. 

The next step is to equate the coefficients of the x? terms in the two sides of (44) to 
obtain 


—a*+8+6 =a. 
At this point we rely on an inspired “guess” — probably reached by Ferrari after a number 
of other trials: 


1 1 
B= 5 (a2 +0" +n) and 6 = (a2 + a° — 9); 


thus reducing the task of finding two constants 3 and 6 to the apparently simpler task of 
finding one constant 7. To complete the algorithm, we need to evaluate @ and 7. 


172 8. MODULI FOR POLYNOMIALS 


Equating the coefficients of the x terms in the two sides of (44), we obtain 


1 1 
olan + a? —) — olan + a? +) = a1; 


from which we easily conclude that 
ay , 
oa 
provided that a 4 0.° We are left with the task of evaluating the last unknown constant a. 
If we equate the constant terms in the two sides of (44), we obtain 


’ 


2 
Oy 
—3 — Ado; 


as + 2a,a” + at — 
a 
a cubic in a? that we know how to solve from the results of the previous section. 


EXAMPLE 8.2. We consider the quartic x4+22+3. The resulting cubic in a? is a®—3a?+1. 
This equation was solved in the last section. 


10. Concluding remarks 


We proceed to some remarks about the general case of a real polynomial p(x) of arbitrary 
degree n > 3. If n is odd, it must have at least one real root, say at x = r. Dividing p(x) by 
x —r we obtain a polynomial of even degree n — 1. A polynomial of even degree may have 
a number of real roots and and an even number of complex roots that occur as conjugate 
pairs. Thus p(z) has m real roots and "5" pairs of non-real complex conjugate roots. Note 
that n and m must have the same parity. This analysis does not reveal how many families 
of polynomials of a given degree there are, nor what they look like. It leads to a series of 
questions paralleling those posed at the beginning of our discussion of the cubic. 

We conclude by returning to the questions which preceded our discussion of the cubic. 
We have seen how the root structure of the quadratic is a special case of the root structure 
of arbitrary polynomials, while the standard form of a quadratic is very special. For degree 
2, there is one standard form. For degree 3, there are three standard forms. What happens 
for polynomials of degree n > 4? For quartics an analysis similar to that used for cubics will 
work (how many equivalence classes will result’?), not quite so for polynomials of degree five 
or higher. But this leads to a different fascinating chapter in the study of algebra. 


11. A moduli (parameter) count 


The general cubic depends on four parameters. Our equivalence relation on polynomials 
of degree three also depends on four parameters (two each for pre and post composition). It 
thus seems reasonable, as we discovered, that there are only three equivalence classes. The 
quartic depends on five parameters and we thus expect that there should one parameter 
families of equivalece classes of quartics. 


EXERCISES 
These exercises are more open ended than those in previous chapters — projects for the 
readers requiring thinking as well as readings of literature on the subject. 


°The condition a = 0 implies that the reduced quartic is of the form x4 + a2x? + ap — a quadratic in x? 


and hence easily solved. Thus we may assume that a # 0. 


11. A MODULI (PARAMETER) COUNT 173 


(1) Let n be a positive integer. On how many parameters does the space of equivalence 
classes of n-th degree polynomials depend? Does it make any difference if one 
considers polynomials with coefficients in the rings Z, Q, R or C and the affine 
maps with coeffiecents in the corresponding rings. 

(2) Complete the work needed to compute the 4 roots of Example 8.2. 

(3) Find the roots of each of the following polynomials. 

(a) 1+2xr4 227+ 23 

(b) 1+ 2x + 32? + 42° 

(c) 1+2x + 2x7 + 22° + x4 
(d) 1+ 22 +32? +423 + 524 

(4) Complete the details for the classification of complex cubics into two equivalence 
classes. 

(5) Determine the set of standard forms for quartics over the reals and complex numbers. 
On how many parameters does each of the equivalence classes depend? 


CHAPTER 9 
Nonsolvability by radicals 


The main aim of this chapter is to prove that polynomials of degree greater than 4 cannot 
be solved by simple formulae. We follow a path described in Chapter VI of [8]. To avoid 
some technical complications, we assume that all fields under discussion are contained in C; 
they automatically contain Q. THE MATERIAL IN THIS CHAPTER IN PRELIMINARY 
FORM — MANY REVISIONS ARE REQUIRED. 


1. Algebraic extensions of fields 
We begin with a general 


DEFINITION 9.1. Let F C C be a field. A number a € C is algebraic over F if there 
exists a polynomial P(x) € Fz] of positive degree with P(a) = 0. Thus a is a root of P(x) 
and P(x) vanishes at a. A number a € C is algebraically independent or transcendental 


over F' if it is not algebraic. The definitions make perfectly good sense for integral domains 
Pee, 


REMARK 9.2. If the polynomial P(x) of the last definition has degree 1, then a € F if 
the latter is a field.. Hence, for fields, the interesting cases involve polynomials of degree 
at least 2. The complex numbers +V2, +: are algebraic over Q satisfying the polynomial 
equations 2? — 2 = 0 and x? + 1 = 0, respectively. The reason for allowing polynomials of 
degree one to appear in the last definition is to guarantee that all the elements of the field 
F are algebraic over F’. 


DEFINITION 9.3. Let F’ be a subfield of FE. We view FE as a F-vector space and call it 
an extension of the field F and a finite extension if the dimension of FE as an F'-vector space 
is finite. We say that E is algebraic over F' if every e € E is algebraic over F’. 


THEOREM 9.4. If E is a finite extension of the field F’', then every element a € E 1s 
algebraic over F. 


ProoF. If a € EF there is nothing to prove. So assume that a ¢ E. If n is bigger than 
or equal to the dimension of EF’ as an F-vector space, then the n + 1 elements 1,a,...,a@” in 
FE cannot be linearly independent over F’. So there exists elements a; € F’, not all zero, such 
that 


(45) Qo tajya+... + a,a” = 0. 


The degree of the last polynomial must be at least two since otherwise a would belong to 
E. 


PROPOSITION 9.5. Let a € C be algebraic over the field F. Let J be the ideal of polyno- 
mials in F'\a| which vanish at a. Let p(x) be the monic polynomial that generates J. Then 
p(x) ts irreducible. 


175 


176 9. NONSOLVABILITY BY RADICALS 


PROOF. Suppose that p(a) = g(x)h(x) is a factorization of the polynomial p(a) and the 
degrees of g(x) and h(x) are strictly less than the degree of p(x). Since p(a) = 0, either 
g(a) = 0 or h(a) = 0. Since p(x) is a polynomial of minimal degree in J, we have reached a 
contradiction. O 


DEFINITION 9.6. With the notation of the last theorem, we may assume that p(x) is 
monic. It is then uniquely determined by a and F’,, and we call it the irreducible polynomial 
of a over F. Its degree is the degree of a over F. 


REMARK 9.7. Note that a € F if and only if p(a) = x — a if and only if the degree of a 
over F' is one. 


THEOREM 9.8. (a) Let a € C be algebraic over the field F. Let n be the degree of its 
irreducible monic polynomial p(x) over F. Then the F-vector space’, F(a), generated by 
l, a, ..., a? + is a field, and the dimension of F(a) as a F-vector space is n (also denoted 
by (F(a) : F)). 

(b) Let a € C be algebraicaly independendent over the field F. Then the F-vector space, E, 
generated by 1, a, ..., a”, ... is infinite dimensional, an integral domain, and the map @ 
that is the identity on F and sends a to x extends to a ring isomorphism 0: E — F {a}. 

(c) Under the same assumptions and notation as in the previous part, let F'(a) be the smallest 
subfield of C containing F anda. Then F(a) is the field of fractions of the integral domain 
E and isomorphic to the field of rational functions F(x). 


ProoF. (a) Let f(x) € Fiz]. By the division algorithm, we can find polynomials q(x) 
and r(x) € F[x] with 


f(x) = q(a)p(w) + r(a) and deg r(x) < deg p(2). 


Thus 

fla) =r(a). 
Denote by E the F-vector space generated by 1,a,...2”~'. (We need to show that EF is a 
field; hence F'(a).) Let a and 6 € EF. Then there exist polynomials fj(x) and es )e Fiz 


each of degree less than n with fi(a@) =a and fo(a) = b. Thus (using f(x) = fi (x) fo(x)) 


ab = fi(a) fo(a) = r(a) € E. 

Now suppose f(z) = fi()fo(a) has degree less than n and f(a) # 0. Then p(x) does 
not divide f(x). Since p(x) is irreducible, we conclude that the polynomials p(x) and f(z) 
must be relatively prime and hence there exist polynomials g(x) and h(«) € F[z] such that 
g(x) f(x) + h(x)p(x) = 1. Hence g(a) f(a) = 1. Showing that every non-zero element of E 
is invertible. This suffices to establish that the ring FE is a field. 

(b) It is clear that F is a ring. It is an infinite dimensional F-vector space since for every 
positive integer n, the vectors 1, a, ..., a” are lineraly independent over F' since a relation 
of the form (45) would imply that a is algebraic over F’. The ring EF is an integral domain 
since in is contained in C. The F-linear map @ is extended to the vector space F by defining 
O(a”) = x”. It is clearly a ring isomorphism. 

(c) It is clear that the field of fractions of E is equal to F(a). The map @: F(a) — F(z) is 
defined as the identity on F' and by sending a to the indeterminate x. It is rather obvious 
how to extend it to all of F(a) to obtain a ring homomorphism to the space of rational 


'The notation implies that we vector space we obtain is a field, as we establish below. 


2. FIELD EMBEDDINGS 177 


functions F(x) over F. To show that 6 is an isomorphism between fields, all we need to 
show that it is surjective. So let p(x) and q(x) € Fla] with q(x) # 0. Hence we is an 


arbitrary element of F(x). Certainly both p(a) and q(a) € F(a). We claim that q(a) 4 0 
a} — P(2) 


g(a) } ~ g(a)" 


since otherwise a would (once again) be algebraic over F’. It is clear 6 f 


DEFINITION 9.9. In general, if FE is a finite dimensional extension of F’, we denote by 
[E : F| the dimension of FE as an F-vector space. 


THEOREM 9.10. If FE, is a finite extension of the field F and E> is a finite extension of 
E,, then Eg is a finite extension of F and 


[EF : F] = [Fo : E/E. : FI. 


DEFINITION 9.11. Let a; and a2 € C be algebraic over a field F’, then a2 is obviously 
algebraic over F'(a,) and we can form the field F'(a,)(a2). Since any field that contains F, 
a, and a2 will contain F'(a;)(a2) = F(a2)(az), this is the smallest field that contains F’, a, 
and a2; we will denote it by F'(a,,a@2). This field is algebraic over F’. In particular, sums 
and products of algebraic numbers are algebraic. For if a; and ag are algebraic over the field 
F, then both a; + a2 and a a2 € F(a1,a2). By induction, if a1, a2, ..., a, are algebraic 
over F', we obtain the field 

F(ay, Q2, .-; Ar), 


by adjoining a1, Q2, ..., A, to F. 
REMARK 9.12. In the notation of the last definition, we have the equality 
[F(ai, Qe, ..., @): F] = [F(ay, a2, ..., ar): F(a, ae, ..., Ari] [F (a1, ae, ..., @r—1): F(a1, a2, ..., Ars] 


Hence Theorem 9.21 will tell us that F(a, ae, ..., a@-) = F(y) for some y € C that is 
algebraic over F’. 


Let F be a field. The discussion of transcendental numbers over F' is similar, but not 
completely parallel to the discussion of algebraic numbers over F’. 


DEFINITION 9.13. Let F’ be a subfield of C and aj, ag, .... a, € C. Let Fo = F and 
define inductively F; = F;_\(a;) fori = 1, 2, ..., r. We will say that the r complex numbers 
Q1, Q2, ..., a, are algebraically independent over F if fori = 1, 2, ..., r, a; is algebraically 
independent over F;_1. 


PROPOSITION 9.14. Under the notation of the last definition, F;. is independent of the 
ordering of the a1, Q2, ..., a, and is hence denoted by F(a1, a2, ..., Ar). 
2. Field embeddings 


DEFINITION 9.15. Let F' and E be fields. A ring homomorphism o : F' > E is automat- 
ically injective and we will hence refer to it also as an embedding of F' in E. Since o(F) isa 
subfield of F, the map o : F — o(F) is an isomorphism and hence invertible. If 


P(r) = ane" +... + a9 € Fz], 
then we define the polynomial 


op(z) = o(an)x" +... +. o(ao) € Elz). 


178 9. NONSOLVABILITY BY RADICALS 


PROPOSITION 9.16. Let F' and E be fields ando : F — E an embedding. The induced 
map 0: F(x] — o(F)[a] is an Q-algebra isomorphim. Further, if p(x) € Fx] is irreducible 
over F’, then op(x) € o(F){x] is irreducible over o(F’). 


Let F and L be fields and f(x) € Fx]. Let a € C, be algebraic over F’. Ifo: F(a) — L 


is an embedding, then 
(of )(a(a)) = o(f(a)). 
In particular, the embedding o of F(a) into L is determined by the restriction of o to FU{a}. 
DEFINITION 9.17. Let 0: F — L andt: E — L be field embeddings and let EF be an 


extension of F’. We say that 7 is an extension of o to E or that a is a restriction of T to F 
if 7(f) =o(f) for all f € F. 

THEOREM 9.18. Leto: F — L be a field embedding. Let p(x) € Fla] be irreducile. 
Leta € C—F be a root of p(x) and let GB be a root of (op)(x) in L. Then there exists an 
embedding T : F(a) — L which is an extension of o and satisfies T(a) = 0. Conversely, for 
every extension T of o to F(a), T(a@) is a root of (ap)(x). 

COROLLARY 9.19. Let p(x) be an irreducible polynomial over the field F anda € Ca 
root of p(x). Leta: F + C be an embedding. Then the number of possible embeddings of 
F(a) into C that extend o equals the degree of the polynomial p(x) (which is the same as the 
degree of the complex number a over the field F’). 


COROLLARY 9.20. Let E be a finite extension of the field F of degree n. Leta: F + C 
be an embedding. Then the number of extensions of o to an embedding of E — C equals n. 


THEOREM 9.21. If E is a finite field extension of F’, then there exists an element y € E 
such that E = F(y). 


3. Splitting fields 


DEFINITION 9.22. Let FE be a finite extension of the field F’. Let o be an embedding of 
F (into some field) and 7 an extension of 0 to an embedding of FE. If o is the identity map, 
then it is convenient to say that 7 is an embedding of FE over F and that 7 leaves F' fixed. 


PROPOSITION 9.23. Let a be an embedding over F' of a finite extension K of a field F. 
If o(K) CK, then o(K) = K and is an automorphism of the field K. 


PROPOSITION 9.24. (a) The set G of all automorphisms of a field K is a group under 
composition. 
(b) If G is a group of automorphisms of a field K, then 
K® = {a € K;o0(a) =a for allg € G} 
is a subfield of K. 
DEFINITION 9.25. A finite extension K of a field F' is Galois if every embedding of K 


over F’ is an automorphism of K; it is a splitting field of the polynomial p(x) € Fla] if 
K = F(ay, ..., Qn), where a1, ..., @, are (all) the roots of p(x). 


THEOREM 9.26. A finite extension of a field F is Galois if and only if it is the splitting 
field of a polynomial p(x) € Fiz}. 


THEOREM 9.27. Let K be a Galois extension of a field F. If p(x) € Fa] is irreducible 
(over F') and has one root in K, then all its roots are in K. 


5. QUADRATIC, CUBIC AND QUARTIC EXTENSIONS 179 


4. Galois extensions 


THEOREM 9.28. Let K be a Galois extension of a field F. If G is the group of automor- 
phisms of K over F, then F is the fixed field of G. 


THEOREM 9.29. Let K be a Galois extension of a field F. To each intermediate field E 
(between F snd K) associate the subgroup Gx pz of the automorphism group of K consisting 
of those automorphisms that leave E' fixed. Then K is Galois over E and the map 


Bw Gk/E 
is a biyyection between the set of intermeddiate fields onto the set of subgroups of G. Further 
E is the fized field of Grip. 


DEFINITION 9.30. If K is a Galois extension of the field F’, we call the group of auto- 
morphisms Gp the Galois group of K over F’. If K is the splitting field of the polynomial 
P(x) € Fz], then we also say that Gx; the Galois group of P(x). 


PROPOSITION 9.31. Let F' be a field, ay, ag, ..., An € C and 
p(x) = (@ — ay) (a — ag)...(@ — Ay). 
Let K = F(ay, 2, ..., An) anda € Gxip. Then {o(a1), o(Q2), ..., T(An)} is a permutation 


Te € S(n) of {a1, Q2, ..., An}. The map that sends 0 € Gxjp to Tz € S(n) is an injective 
group homomorphism. 


5. Quadratic, cubic and quartic extensions 


5.1. Linear extensions. The only irreducible monic polynomial of degree 1 over a field 
F is of the form x + b with b € F. Hence the only extension EF of F with |F : F] = 1 is 
Ce ae 


5.2. Quadratic extensions. Let F be a field. An irreducible polynomial p(x) = x? + 
bz +c, band c€ F over F has a splitting field F'(a), where 


—b+ Vb? —4Ac 
a= ' 
2, 
We conclude that F(a) is Galois over F and G'p(q)/r is cyclic of order 2. If we let d = b? — 4c 


(this is the discriminant of the quadratic polynomial p()), then we see that F(a) = F(vd). 
Conversely, the polynomial x? — d is irreducible over F if and only d is not a square in F”. 


5.3. Cubic extensions. We start with a field F and a cubic plynomial p(x) € Fz]. 
We have already seen several times that after completing a square, a monic cubic can be 
reduced to the form 


p(t) = 2° + be +¢= (x — a1)(x — a2)(x — a3), 


where 6 and c € F, but the complex roots a; may or may not be elements of the field F’. If 
p(x) has no root in F’, then it is irreducible over F’, (which we assume for the rest of this 
subsection) and simple calculations show that 


—(a1 + a2 +03) =0, aia2 + a1a3 + a2a3 = b, —a A203 = C. 


2Whenever d is a square in F, F(V/d) = F, of course. 


180 9. NONSOLVABILITY BY RADICALS 


It is convenient to define 
5 = (a2 — a1)(a3 — a1) (3 — ag) and D = 6°, 


and call D the discriminant of the polynomial p(x). 

Let K = F(a1, a2, a3) be the splitting field of p(x) and G its Galois group over F’. The 
group G is isomorphic to a subgroup of the symmetric group $(3). Since K contains a root 
(say a1) of p(x), it follows that the order of G = [K : E] is either 3 or 6 — it cannot be 2. In 
the first case G' is cyclic of order 3 and K = F(a;). In the second case G is isomorphic to 


5(3). 


THEOREM 9.32. We have two mutually exclusive possibilities: 
(a) If D is a square in F’, then K has degree 3 over F’, or 
(b) If D is not a square in F’, then the group G is isomorphic to S(3). 


THEOREM 9.33. K = F(VD,«a). 


EXAMPLE 9.34. Consider the polynomial p(x) = x? — 32 +1. It has no roots over Zy; 
hence no roots in Z. Thus irreducible over Q. Its discriminant is 34, a square in Q. The 
Galois group of p(x) is thus cyclic of order 3 and its splitting field is Q(a), where a is a root 


1472 


of p(x); for example, es (1 +e9 ) (see Example 8.1). 


5.4. Quartic extensions. 


6. Nonsolvability 


DEFINITION 9.35. A Galois field extension whose Galois group is abelian is called an 
abelian extension. 


Let K = F(a) is a Galois extension of the field F where a € C—F. If o and 7 are 
automorphisms of K over F’, then ot = Ta if and only if (o7)(@) = (Ta)(a). 
We begin the build-up to our main (nonsolvability) theorem. 


THEOREM 9.36. Let n be a positive integer and w be an n-th root of unity. Let F be a 
field that does not contain w. Then K = F(w) is an abelian extension of F. 


THEOREM 9.37. Let n be a positive integer and F' a field that contains the n-th roots of 
unity. Leta € C— F witha” € F. Then K = F(a) is an abelian extension of F’. 


DEFINITION 9.38. Let F be a field and f(x) € Fa] be a polynomial of degree > 1. We 
say that F or f(x) is solvable by radicals if the splitting field of F’ is contained in a Galois 
extension K which has a sequence of subfields {F;} such that 


F=hjoCr, Cc... Fo=K 
with 
(a) F = F(w) for some primitive n-th root of unity w, and 
(b) For0<i<r, Fis, = Fi(a;) with a; ¢ F, a@ € F, and din. 


REMARK 9.39. e Because the field inclusions in the above definition are proper, 
we have to consider the possibility that F’ already contains an appropriate n-th root 
of unity. In this case condition (a) in the above definition is dropped and the indeces 
i in condition (b) run from 0 (to r — 1). 


6. NONSOLVABILITY 181 


e Degree 1 polynomials are, of course, solvable. For such polynomials (f(x) = ax + 
back bEF),r=0. 


We use the notation and cocepts of our last definition. Since dln, w4 is a primitive d-th 
root of unity. Hence F;,, is an abelian extension of F;. Thus the Galois group G of K over 
F decomposes into a sequence of abelian extensions and if we let H; be the Galois group of 
K over F;, we get a sequence of groups that satisfy 16 and that G is a solvable group. We 
have thus established 


THEOREM 9.40. Jf f(x) is solvable by radicals, then its Galois group is solvable. 
We can now state the main theorem of this chapter: 


THEOREM 9.41. Let a1,Q2, ...,Q, be algebraically independent over a field Fo, and let 


fta= [[@ — aj) = 2" + dpe”) +... ag € Cla]. 
i=1 
Let F = Fo(dn_1,..-, @o) and K = F(apy,..., a1). Then K is a Galois extension of F with 
Galois group S(n). 
We need to know we can find a set of complex numbers a1, Qo, ..., @, that are algebraically 
independent over a field Fy C C; for example, over Q. This will follow from the fact that Q 
is countable, while C is not. 


DEFINITION 9.42. Let F be a field. We define the algebraic closure F of F to be the set 
of complex numbers that are algebraic over F’. 


PROPOSITION 9.43. If F is a countable field, so are F and F(z). 


COROLLARY 9.44. For every positive integer n, there exists n algebraically independent 
complex numbers over Q. 


EXERCISES 


(1) In our discussion of quadratic extensions, we assumed that the polynomial p(x) = 
x? + bx + c was irreducible over the field F. What happens when the polynomial is 
reducible? 

(2) Formulate and solve problems similar to the last question for cubic and quartic 
extensions. 


Bibliography 


L.V. Ahlfors, Complex Analysis (third edition), McGraw-Hill, 1979. 

H. Anton and Ch. Rorres, Elementary Linear Algebra (Applications Version), 8th edition, John Wiley 
and Sons, 2000. 

R. Corless, Essential Maple 7, Springer, 2002. 

H. Derksen, The fundamental theorem of algebra and linear algebra, Amer. Math. Monthly 110 (2003), 
620-623. 

A. F. Coxford et al, Contemporary Mathematics in Context, A unified approach; Course 2, Part A, 
Everyday Learning, 1997. 

J. P. Gilman, I. Kra, and R. Rodriguez, Complex Analysis, In the Spirit of Lipman Bers, Graduate 
Texts in Mathematics, vol. 245, Springer-Verlag, 2007. 

J.F. Humphreys and M.Y. Prest, Numbers, Groups and Codes, Cambridge University Press, 1989. 

S. Lang, Algebraic Structures, Addison-Wesley, 1967. 

M. Spivack, Calculus, Publish or Persish, Inc. 

R.S. Wolf, Proof, Logic, and Conjecture: The Mathematician’s Toolbox, W.H. Freeman and Company, 
1998. 


183 


yp, ol 


absolute value, 21 
algorithm 
division, 19, 20, 120 
Euclidean, 22, 120, 123 
GCD, 23 
automorphism, 106 
ring, 115 


binary 
relation, 11 
operation, 83 
binomial 
coefficient, 20 
theorem, 20 


cancelation, 67 

Cardano, 170 

cardinality, 60, 63 
countable, 64 
finite, 63 


Index 


epimorphism, 106 
Euclid, 22 
Euclidean 
algorithm, 123 
Euler, 53 
-function, 51 
exact sequence, 109 


factor, 19 
factorial, 19 
Ferrari, 171 
field, 129 
of quotients, 130 
FTA, 129, 143, 144, 146 
function, 60 
y, 51 
coding, 132 
fundamental theorem 
algebra, 144 
fundamental theorem 
algebra, 146 


Cayley, 110 fundamental theorem of algebra, 143 


Chinese remainder theorem, 45, 128 fundamental theorem of arithmetic, 32 


code word, 132 
coding function, 132 
coefficient 

binomial, 20 
complete 

algebraic, 36 
congruence, 37 
congruence class, 37 

representative, 37 

standard, 37 

CRT, 45, 102, 128 
cycle 

order, 78 
divides, 19 
division 

algorithm, 20, 120 
division algorithm, 19 


endomorphism, 144 


185 


gcd, 19, 22, 27 
GCD algorithm, 23 


greater than or equal, 67, 68 
greatest common divisor, 22 


group, 83 
abelian, 83 
commutative, 83 
coset, 98 
cyclic, 97 
cyclic subgroup, 97 
finite, 96 
generators, 97 
homomorphism, 100 
index, 99 
isomorphic, 100 
isomorphism, 100 
of order 1, 103 
of order 4, 103 
of order 6, 103 


186 


of order 8, 104 
of prime order, 103 
order, 96 
permutation, 71 
quotient, 108 
relations, 97 
solvable, 111 
subgroup, 96 
normal, 106 
symmetric, 71 


homomorphism, 106 
canonical, 108 
epimorphism, 106 
image, 107 
kernel, 107 
monomorphism, 106 
ring, 115 


ideal, 125, 126 
proper, 126 
trivial, 126 
unit, 126 
indeterminate, 118 
induction, 12 
integer, 11 
division algorithm, 20 
greater than or equal, 67 
negative, 67 
nonnegative, 11 
positive, 67 
integral domain, 114 
inverse, 54 
isomorphism, 110 
ring, 115 


Lagrange’s theorem, 99 
Icm, 19, 28 


modular arithmetic, 37 
modulo n, 37 
inverse, 40 
zero-divisor, 40 
monomorphism, 106 
multiple, 19 


natural number, 11 

number 
algebraic, 36, 175 
algebraically independent, 175 
ceiling, 21 
complex, 36 
Euclidean algorithm, 22 
floor, 21 


fundamental theorem of arithmetic, 32 


INDEX 


gcd, 22, 27 

integral content, 21 
quaternion, 36, 87 
rational, 34 

real, 35 

relatively prime, 22 
transcendental, 36, 175 


order relation, 11 


permutation, 71 
conjugacy class, 80 
conjugate, 79 
cycle, 73 

disjoint, 73 

length, 73 
multiplication, 71 
order, 78 
shape, 79 
sign, 80, 81 

even, 81 

odd, 81 
transposition, 73 

permutation group, 110 

polynomial, 143 
degree, 118 
divide, 120 
division algorithm, 120 
Euclidean algorithm, 120, 123 
gcd, 121 
greatest common divisor, 122 
leading coefficient, 118 
monic, 118 
over field, 129 
relatively prime, 122 

prime, 29 
relatively, 22, 33 

product, 83 


quotient, 106 


rational, 34 
greater than or equal, 68 
relation 
binary, 11 
equivalence, 66 
ring 
ideal, 126 
integral domain, 114 
quotient, 126 
subring, 113 


series 
Taylor, 124 
solvable, 175 


INDEX 


Taylor series, 124 

theorem 
binomial, 20 
FTA, 32 


unit, 40 


well ordering, 12 
word 


length, 132 


zero divisor, 114 


187 


